Visualizing Public Transportation Speeds with Python
When people choose between driving and public transportation for urban trips, travel time (and public transportation frequency) is the most influential factor in their decision. Here we use Python to evaluate public transportation timetable speed between stops to look for slow spots. First we load some libraries:
import requests import pandas import tempfile import datetime import gtfstk import folium import haversine from functools import reduce from multiprocessing import Pool
Next we create some helper functions. We make a function url2gtfs to extend the gtfstk library so it can load a library directly without downloading it first. We create parseTime to deal with date fields that go past 24 hours in timetables. Next we make tripToSegments to split up each public transportation vehicle's "trip" into "segments" between stops. And last we create function to plot each segment.
def url2gtfs(url): r = requests.get(url) with tempfile.NamedTemporaryFile() as f: f.write(r._content) return gtfstk.read_gtfs(f.name, dist_units='mi') def parseTime(time): # to deal with times after midnight if int(time.split(':')[0]) > 23: return datetime.datetime.strptime('2:' + str(int(time.split(':')[0]) - 24) + ':' + time.split(':')[1] + ':' + time.split(':')[2], '%d:%H:%M:%S') else: return datetime.datetime.strptime('1:' + time, '%d:%H:%M:%S') def tripToSegments(trip_id): stop_times = gtfs.stop_times[gtfs.stop_times['trip_id'] == trip_id] segments = [] for i in range(len(stop_times)-1): origin = gtfs.stops[gtfs.stops['stop_id'] == stop_times['stop_id'].iloc[i]] destination = gtfs.stops[gtfs.stops['stop_id'] == stop_times['stop_id'].iloc[i+1]] distance = haversine.haversine((origin['stop_lat'], origin['stop_lon']), (destination['stop_lat'], destination['stop_lon']), miles=True) startTime = parseTime(stop_times['departure_time'].iloc[i]) stopTime = parseTime(stop_times['arrival_time'].iloc[i+1]) duration = (stopTime - startTime).seconds if duration == 0: speed = 0 # when the laws of physics do not apply else: speed = distance / duration * 60 * 60 segments.append({'origin_id': origin['stop_id'].iloc[0], 'destination_id': destination['stop_id'].iloc[0], 'distance': distance, 'duration': duration, 'speed': speed}) return(segments) def plotSegment(segment): origin = gtfs.stops[gtfs.stops['stop_id'] == segment.name[0]] destination = gtfs.stops[gtfs.stops['stop_id'] == segment.name[1]] folium.PolyLine(locations=[(origin.stop_lat.iloc[0], origin.stop_lon.iloc[0]), (destination.stop_lat.iloc[0], destination.stop_lon.iloc[0])], popup=str(segment.speed['min']), weight=segment.speed['min'] ).add_to(foliumMap)
Here we specify the General Transit Feed Specification (GTFS) feed for the public transportation agency we want to analyze. Then we can look at the routes in that feed.
url = 'https://gitlab.com/LACMTA/gtfs_rail/raw/master/gtfs_rail.zip' # Los Angeles County Metro # url = 'http://github.com/transitland/gtfs-archives-not-hosted-elsewhere/raw/master/amtrak.zip' # Amtrak gtfs = url2gtfs(url) gtfs.routes
route_id | route_short_name | route_long_name | route_desc | route_type | route_color | route_text_color | route_url | |
---|---|---|---|---|---|---|---|---|
0 | 801 | NaN | Metro Blue Line (801) | NaN | 0 | 004DAC | FFFFFF | http://www.metro.net/around/rail/blue-line/ |
1 | 802 | NaN | Metro Red Line (802) | NaN | 1 | EE3A43 | FFFFFF | http://www.metro.net/around/rail/red-line/ |
2 | 803 | NaN | Metro Green Line (803) | NaN | 0 | 2EAB00 | FFFFFF | http://www.metro.net/around/rail/green-line/ |
3 | 804 | NaN | Metro Gold Line (804) | NaN | 0 | DA7C20 | FFFFFF | http://www.metro.net/around/rail/gold-line/ |
4 | 806 | NaN | Metro Expo Line (806) | NaN | 0 | 0177A5 | FFFFFF | http://www.metro.net/around/rail/expo-line/ |
5 | 805 | NaN | Metro Purple Line (805) | NaN | 1 | 9561A9 | FFFFFF | http://www.metro.net/around/rail/purple-line/ |
Next we pick the route we want to analyse and the direction (0.0 or 1.0). We filter the GTFS data for the route and direction, and split the trips into segments. Here we have the min, max, mean speed between each stop pair.
route_id = '806' direction_id = 1.0 trip_ids = gtfs.trips[(gtfs.trips['route_id'] == route_id) & (gtfs.trips['direction_id'] == direction_id)]['trip_id'] p = Pool(8) segments = reduce(lambda x, y: x + y, p.map(tripToSegments, trip_ids)) segmentsDF = pandas.DataFrame(segments).groupby(['origin_id', 'destination_id']).describe() segmentsDF
distance | duration | speed | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | mean | std | min | 25% | 50% | 75% | max | count | mean | ... | 75% | max | count | mean | std | min | 25% | 50% | 75% | max | ||
origin_id | destination_id | |||||||||||||||||||||
80121 | 80123 | 1021.0 | 0.910273 | 2.221534e-16 | 0.910273 | 0.910273 | 0.910273 | 0.910273 | 0.910273 | 1021.0 | 230.401567 | ... | 240.0 | 240.0 | 1021.0 | 14.249816 | 6.204441e-01 | 13.654098 | 13.654098 | 13.654098 | 14.895380 | 14.895380 |
80122 | 80121 | 1021.0 | 0.685952 | 1.110767e-16 | 0.685952 | 0.685952 | 0.685952 | 0.685952 | 0.685952 | 1021.0 | 177.747307 | ... | 180.0 | 240.0 | 1021.0 | 14.122148 | 1.640630e+00 | 10.289281 | 13.719042 | 13.719042 | 15.433922 | 15.433922 |
80123 | 80124 | 1021.0 | 0.547751 | 1.110767e-16 | 0.547751 | 0.547751 | 0.547751 | 0.547751 | 0.547751 | 1021.0 | 170.401567 | ... | 180.0 | 180.0 | 1021.0 | 11.612220 | 6.844721e-01 | 10.955026 | 10.955026 | 10.955026 | 12.324404 | 12.324404 |
80124 | 80125 | 1021.0 | 0.512530 | 1.110767e-16 | 0.512530 | 0.512530 | 0.512530 | 0.512530 | 0.512530 | 1021.0 | 110.401567 | ... | 120.0 | 120.0 | 1021.0 | 16.851746 | 1.537103e+00 | 15.375900 | 15.375900 | 15.375900 | 18.451081 | 18.451081 |
80125 | 80126 | 1021.0 | 0.332504 | 5.553836e-17 | 0.332504 | 0.332504 | 0.332504 | 0.332504 | 0.332504 | 1021.0 | 110.401567 | ... | 120.0 | 120.0 | 1021.0 | 10.932578 | 9.971961e-01 | 9.975122 | 9.975122 | 9.975122 | 11.970147 | 11.970147 |
80126 | 80127 | 1021.0 | 0.994773 | 4.443068e-16 | 0.994773 | 0.994773 | 0.994773 | 0.994773 | 0.994773 | 1021.0 | 170.401567 | ... | 180.0 | 180.0 | 1021.0 | 21.089003 | 1.243073e+00 | 19.895470 | 19.895470 | 19.895470 | 22.382403 | 22.382403 |
80127 | 80128 | 1021.0 | 1.526340 | 2.221534e-16 | 1.526340 | 1.526340 | 1.526340 | 1.526340 | 1.526340 | 1021.0 | 230.401567 | ... | 240.0 | 240.0 | 1021.0 | 23.893999 | 1.040357e+00 | 22.895103 | 22.895103 | 22.895103 | 24.976476 | 24.976476 |
80128 | 80129 | 1021.0 | 0.638851 | 0.000000e+00 | 0.638851 | 0.638851 | 0.638851 | 0.638851 | 0.638851 | 1021.0 | 110.401567 | ... | 120.0 | 120.0 | 1021.0 | 21.005114 | 1.915945e+00 | 19.165524 | 19.165524 | 19.165524 | 22.998628 | 22.998628 |
80129 | 80130 | 1021.0 | 0.522143 | 0.000000e+00 | 0.522143 | 0.522143 | 0.522143 | 0.522143 | 0.522143 | 1021.0 | 110.401567 | ... | 120.0 | 120.0 | 1021.0 | 17.167829 | 1.565934e+00 | 15.664301 | 15.664301 | 15.664301 | 18.797161 | 18.797161 |
80130 | 80131 | 1021.0 | 0.977151 | 2.221534e-16 | 0.977151 | 0.977151 | 0.977151 | 0.977151 | 0.977151 | 1021.0 | 110.401567 | ... | 120.0 | 120.0 | 1021.0 | 32.128270 | 2.930524e+00 | 29.314534 | 29.314534 | 29.314534 | 35.177441 | 35.177441 |
80131 | 80132 | 1021.0 | 0.971643 | 2.221534e-16 | 0.971643 | 0.971643 | 0.971643 | 0.971643 | 0.971643 | 1021.0 | 139.196866 | ... | 160.0 | 160.0 | 1021.0 | 25.651948 | 3.642507e+00 | 21.861969 | 21.861969 | 29.149292 | 29.149292 | 29.149292 |
80132 | 80133 | 1021.0 | 0.879870 | 3.332301e-16 | 0.879870 | 0.879870 | 0.879870 | 0.879870 | 0.879870 | 1021.0 | 110.401567 | ... | 120.0 | 120.0 | 1021.0 | 28.929703 | 2.638773e+00 | 26.396092 | 26.396092 | 26.396092 | 31.675310 | 31.675310 |
80133 | 80134 | 1021.0 | 1.273532 | 2.221534e-16 | 1.273532 | 1.273532 | 1.273532 | 1.273532 | 1.273532 | 1021.0 | 230.401567 | ... | 240.0 | 240.0 | 1021.0 | 19.936435 | 8.680424e-01 | 19.102987 | 19.102987 | 19.102987 | 20.839622 | 20.839622 |
80134 | 80135 | 1021.0 | 0.561476 | 2.221534e-16 | 0.561476 | 0.561476 | 0.561476 | 0.561476 | 0.561476 | 1021.0 | 110.401567 | ... | 120.0 | 120.0 | 1021.0 | 18.461071 | 1.683894e+00 | 16.844283 | 16.844283 | 16.844283 | 20.213140 | 20.213140 |
80135 | 80136 | 1021.0 | 1.098758 | 2.221534e-16 | 1.098758 | 1.098758 | 1.098758 | 1.098758 | 1.098758 | 1021.0 | 170.401567 | ... | 180.0 | 180.0 | 1021.0 | 23.293455 | 1.373012e+00 | 21.975161 | 21.975161 | 21.975161 | 24.722056 | 24.722056 |
80136 | 80137 | 999.0 | 0.963716 | 1.110779e-16 | 0.963716 | 0.963716 | 0.963716 | 0.963716 | 0.963716 | 999.0 | 110.390390 | ... | 120.0 | 120.0 | 999.0 | 31.689751 | 2.890390e+00 | 28.911472 | 28.911472 | 28.911472 | 34.693766 | 34.693766 |
80137 | 80138 | 999.0 | 0.725943 | 1.110779e-16 | 0.725943 | 0.725943 | 0.725943 | 0.725943 | 0.725943 | 999.0 | 50.390390 | ... | 60.0 | 60.0 | 999.0 | 54.020643 | 1.088630e+01 | 43.556596 | 43.556596 | 43.556596 | 65.334894 | 65.334894 |
80138 | 80139 | 999.0 | 0.892721 | 0.000000e+00 | 0.892721 | 0.892721 | 0.892721 | 0.892721 | 0.892721 | 999.0 | 240.000000 | ... | 240.0 | 240.0 | 999.0 | 13.390820 | 3.554493e-15 | 13.390820 | 13.390820 | 13.390820 | 13.390820 | 13.390820 |
And finally we plot the route where the width of the line is the speed between each stop pair. The Expo Line really is much slower into downtown than along the rest of the line.
foliumMap = folium.Map(location=[34, -118], zoom_start=9) segmentsDF.apply(plotSegment, axis=1) foliumMap
This script is also available as a Jupyter Notebook.
Reply to this article.