Escalator Benchmark
Escalator Benchmark
2025-01-31 - Data Visualization, Web Development, Artificial Intelligence
Developer
URL

Context

For three weeks in 2015, researchers asked commuters in London not to walk up the escalator to find out the optimal strategy at rush hours. This seems like a reasonably easy thing to model and simulate without annoying anyone for three weeks. But is it?

In this project, we ask different AI models to do the work of researchers... in less than 5 minutes. The objective is to challenge the claims that large language models can now fully automate scientific research. There are several dimensions to the research:

  • a modelling challenge (passenger flow)
  • a simulation challenge (high-level game engine)
  • a rendering challenge (UI)

Methodology

LLMs, and reasoners in particular, can be extremely helpful to the extent that you provide sufficient context. The prompt below has been used for all the model outputs presented in this project.

# Project description
I want to build a model to assess the optimal escalator strategy when stations are busy. Let's assume that an escalator is wide enough for two people
-strategy 1 : everyone stands still on the escalator, with two people on each step
-strategy 2 : people who do not want to walk up stand still on the right, and those who want to walk up can do so on the left.
the model should take many inputs, including, total number of people arriving at the bottom of the escalator, lengths and speed of the escalator, percentage of people who want to walk up but also average speed for those who walk up and maybe a distribution of walking speeds to simulate slow downs when slow people walk.
There may be more, please suggest anything that makes sense.
In terms of output, I want to put show this simulator in a react app with tailwind so please provide the code for both the simulator and the UI
# Implementation steps
I am working on a React App with tailwind, here is how I envisage the UI:
## left column Inputs
### Escalator variables
- escalator length
- escalator speed
### People output
- number of people arriving at the bottom of the escalator
- percentage of people walking up (0 for strategy one)
- average speed of people walking up
- normal distribution of walking speed
## Right Column
### A basic animation in javascript to show the escalator
- People can be represented by dots that accumulate at the bottom. In strategy 1, they all accumulate at the bottom, in strategy 2, the space at the bottom is divided in 2 to let people who want to access the escalator.
- As time lapses, the dots progress until they reach the escalator (and move up at different speeds depending on whether they stand or walk). Those who walk, cannot overtake those ahead of them if they are slower (walking speed is selected randomly in normal distribution). It should be a real simulation that can be reset.
### statistics
- A counter for the points that reach the top and statistics of passenger flow per minute
- Maybe a comparison with strategy 1 (which can be modelled without randomisation) in terms of flow over the same period and the number of people at the bottom

The responses are screened for security and pasted into the app. If the code in the first response does not work, we ask the LLM to fix the issue until it renders. The number of queries are clearly indicated in the relevant tab.

The project also includes my own implementation (using AI assistants). It’s a work in progress, which I iterate on when I find time. As of Jan 31, 2025, I have spent around 40 hours working on the PITTI implementation. I believe that it is good enough to be shared but there are critical flows on the modelling side (the graph approach is neither appropriate nor well implemented). I will consider better options.

The app with all proposals (AI and human ones) can be run locally. See information in the README file.

Preliminary conclusions

While the models are definitely useful to lay the foundations, the claims that AI models can already fully automate research seem largely overblown. Here, the math is trivial and any undergrad with a math background would find a way to incorporate it in the modelling. It is basically about making the right choices. And on the UI side, it is also clear that you still need human involvement to piece everything together and give models a little nudge when they start going off-track.

To illustrate the takeaways of this project, I mapped each model output (very subjectively and in the most un-scientific way possible) :

  • X-axis : for the UI
  • Y-axis : for the reasoning

In each case, the projects are assessed on a scale from 0 to 10. Anything below 5 means that the model did not really understand the objectives, anything above 8 means that the objectives are met, albeit there is room for improvement.

For transparency, I later added a Z-axis to represent time spent. This dimension should not be ignored.

Next steps

Adding more models, and refine the human approach

To contribute to the project (both humans and AI suggestions)

  • Create a new component in src/models
  • import in App.jsx (should be self explanatory)
  • submit PR

Feel free to propose alternative prompts for this project

Linked articles
We care about your privacy so we do not store nor use any cookie unless it is stricly necessary to make the website to work
Got it
Learn more