Data visualization of on-time performance, delays, and more
The system fetches real-time train location data from the GTFS-RT (General Transit Feed Specification - Real-time) API using the API key for the Caltrain operator. The data collected includes each train’s ID, its latitude and longitude, stop ID, and the timestamp of when the location was recorded. Collected data from to with data points.
The data is stored in an SQLite database (caltrain_lat_long.db
). A table called
train_locations
is used to store the information if it doesn't already exist. Each record is uniquely identified by the
combination of
timestamp, trip ID, and stop ID to prevent duplicate entries.
The stored data is processed by joining it with additional static data from the GTFS (schedules and stops). The system calculates the distance between a train's current position and the scheduled stop using the Haversine formula to detect when a train has arrived at a stop.
The on-time performance is calculated by comparing the actual arrival time (when the train is closest to the stop) with the scheduled arrival time. The delay for each trip is computed in minutes, and a trip is flagged as "delayed" if it arrives more than 4 minutes late.
Plotly is used to generate graphs for various metrics such as daily on-time performance, delay severity, and delays during morning/evening commute times. These graphs are saved as HTML files and displayed in a website using iframes.