David Bailey

Visualizing Public Transportation Speeds with Python


When people choose between driving and public transportation for urban trips, travel time (and public transportation frequency) is the most influential factor in their decision. Here we use Python to evaluate public transportation timetable speed between stops to look for slow spots. First we load some libraries:

import requests
import pandas
import tempfile
import datetime
import gtfstk
import folium
import haversine
from functools import reduce
from multiprocessing import Pool

Next we create some helper functions. We make a function url2gtfs to extend the gtfstk library so it can load a library directly without downloading it first. We create parseTime to deal with date fields that go past 24 hours in timetables. Next we make tripToSegments to split up each public transportation vehicle's "trip" into "segments" between stops. And last we create function to plot each segment.

def url2gtfs(url):
    r = requests.get(url)
    with tempfile.NamedTemporaryFile() as f:
      f.write(r._content)
      return gtfstk.read_gtfs(f.name, dist_units='mi')


def parseTime(time): # to deal with times after midnight
    if int(time.split(':')[0]) > 23:
        return datetime.datetime.strptime('2:' + str(int(time.split(':')[0]) - 24) + ':'
                                          + time.split(':')[1] + ':' + time.split(':')[2], '%d:%H:%M:%S')
    else:
        return datetime.datetime.strptime('1:' + time, '%d:%H:%M:%S')


def tripToSegments(trip_id):
    stop_times = gtfs.stop_times[gtfs.stop_times['trip_id'] == trip_id]
    segments = []
    for i in range(len(stop_times)-1):
        origin = gtfs.stops[gtfs.stops['stop_id'] == stop_times['stop_id'].iloc[i]]
        destination = gtfs.stops[gtfs.stops['stop_id'] == stop_times['stop_id'].iloc[i+1]]
        distance = haversine.haversine((origin['stop_lat'], origin['stop_lon']),
                         (destination['stop_lat'], destination['stop_lon']),
                         miles=True)
        startTime = parseTime(stop_times['departure_time'].iloc[i])
        stopTime = parseTime(stop_times['arrival_time'].iloc[i+1])
        duration = (stopTime - startTime).seconds
        if duration == 0:
            speed = 0 # when the laws of physics do not apply
        else:
            speed = distance / duration * 60 * 60
        segments.append({'origin_id': origin['stop_id'].iloc[0],
                         'destination_id': destination['stop_id'].iloc[0],
                         'distance': distance,
                         'duration': duration,
                         'speed': speed})
    return(segments)

def plotSegment(segment):
    origin = gtfs.stops[gtfs.stops['stop_id'] == segment.name[0]]
    destination = gtfs.stops[gtfs.stops['stop_id'] == segment.name[1]]
    folium.PolyLine(locations=[(origin.stop_lat.iloc[0], origin.stop_lon.iloc[0]),
                               (destination.stop_lat.iloc[0], destination.stop_lon.iloc[0])],
                    popup=str(segment.speed['min']),
                    weight=segment.speed['min']
                   ).add_to(foliumMap)

Here we specify the General Transit Feed Specification (GTFS) feed for the public transportation agency we want to analyze. Then we can look at the routes in that feed.

url = 'https://gitlab.com/LACMTA/gtfs_rail/raw/master/gtfs_rail.zip' # Los Angeles County Metro
# url = 'http://github.com/transitland/gtfs-archives-not-hosted-elsewhere/raw/master/amtrak.zip' # Amtrak
gtfs = url2gtfs(url)
gtfs.routes
route_id route_short_name route_long_name route_desc route_type route_color route_text_color route_url
0 801 NaN Metro Blue Line (801) NaN 0 004DAC FFFFFF http://www.metro.net/around/rail/blue-line/
1 802 NaN Metro Red Line (802) NaN 1 EE3A43 FFFFFF http://www.metro.net/around/rail/red-line/
2 803 NaN Metro Green Line (803) NaN 0 2EAB00 FFFFFF http://www.metro.net/around/rail/green-line/
3 804 NaN Metro Gold Line (804) NaN 0 DA7C20 FFFFFF http://www.metro.net/around/rail/gold-line/
4 806 NaN Metro Expo Line (806) NaN 0 0177A5 FFFFFF http://www.metro.net/around/rail/expo-line/
5 805 NaN Metro Purple Line (805) NaN 1 9561A9 FFFFFF http://www.metro.net/around/rail/purple-line/

Next we pick the route we want to analyse and the direction (0.0 or 1.0). We filter the GTFS data for the route and direction, and split the trips into segments. Here we have the min, max, mean speed between each stop pair.

route_id = '806'
direction_id = 1.0

trip_ids = gtfs.trips[(gtfs.trips['route_id'] == route_id) & (gtfs.trips['direction_id'] == direction_id)]['trip_id']
p = Pool(8)
segments = reduce(lambda x, y: x + y, p.map(tripToSegments, trip_ids))
segmentsDF = pandas.DataFrame(segments).groupby(['origin_id', 'destination_id']).describe()
segmentsDF
distance duration speed
count mean std min 25% 50% 75% max count mean ... 75% max count mean std min 25% 50% 75% max
origin_id destination_id
80121 80123 1021.0 0.910273 2.221534e-16 0.910273 0.910273 0.910273 0.910273 0.910273 1021.0 230.401567 ... 240.0 240.0 1021.0 14.249816 6.204441e-01 13.654098 13.654098 13.654098 14.895380 14.895380
80122 80121 1021.0 0.685952 1.110767e-16 0.685952 0.685952 0.685952 0.685952 0.685952 1021.0 177.747307 ... 180.0 240.0 1021.0 14.122148 1.640630e+00 10.289281 13.719042 13.719042 15.433922 15.433922
80123 80124 1021.0 0.547751 1.110767e-16 0.547751 0.547751 0.547751 0.547751 0.547751 1021.0 170.401567 ... 180.0 180.0 1021.0 11.612220 6.844721e-01 10.955026 10.955026 10.955026 12.324404 12.324404
80124 80125 1021.0 0.512530 1.110767e-16 0.512530 0.512530 0.512530 0.512530 0.512530 1021.0 110.401567 ... 120.0 120.0 1021.0 16.851746 1.537103e+00 15.375900 15.375900 15.375900 18.451081 18.451081
80125 80126 1021.0 0.332504 5.553836e-17 0.332504 0.332504 0.332504 0.332504 0.332504 1021.0 110.401567 ... 120.0 120.0 1021.0 10.932578 9.971961e-01 9.975122 9.975122 9.975122 11.970147 11.970147
80126 80127 1021.0 0.994773 4.443068e-16 0.994773 0.994773 0.994773 0.994773 0.994773 1021.0 170.401567 ... 180.0 180.0 1021.0 21.089003 1.243073e+00 19.895470 19.895470 19.895470 22.382403 22.382403
80127 80128 1021.0 1.526340 2.221534e-16 1.526340 1.526340 1.526340 1.526340 1.526340 1021.0 230.401567 ... 240.0 240.0 1021.0 23.893999 1.040357e+00 22.895103 22.895103 22.895103 24.976476 24.976476
80128 80129 1021.0 0.638851 0.000000e+00 0.638851 0.638851 0.638851 0.638851 0.638851 1021.0 110.401567 ... 120.0 120.0 1021.0 21.005114 1.915945e+00 19.165524 19.165524 19.165524 22.998628 22.998628
80129 80130 1021.0 0.522143 0.000000e+00 0.522143 0.522143 0.522143 0.522143 0.522143 1021.0 110.401567 ... 120.0 120.0 1021.0 17.167829 1.565934e+00 15.664301 15.664301 15.664301 18.797161 18.797161
80130 80131 1021.0 0.977151 2.221534e-16 0.977151 0.977151 0.977151 0.977151 0.977151 1021.0 110.401567 ... 120.0 120.0 1021.0 32.128270 2.930524e+00 29.314534 29.314534 29.314534 35.177441 35.177441
80131 80132 1021.0 0.971643 2.221534e-16 0.971643 0.971643 0.971643 0.971643 0.971643 1021.0 139.196866 ... 160.0 160.0 1021.0 25.651948 3.642507e+00 21.861969 21.861969 29.149292 29.149292 29.149292
80132 80133 1021.0 0.879870 3.332301e-16 0.879870 0.879870 0.879870 0.879870 0.879870 1021.0 110.401567 ... 120.0 120.0 1021.0 28.929703 2.638773e+00 26.396092 26.396092 26.396092 31.675310 31.675310
80133 80134 1021.0 1.273532 2.221534e-16 1.273532 1.273532 1.273532 1.273532 1.273532 1021.0 230.401567 ... 240.0 240.0 1021.0 19.936435 8.680424e-01 19.102987 19.102987 19.102987 20.839622 20.839622
80134 80135 1021.0 0.561476 2.221534e-16 0.561476 0.561476 0.561476 0.561476 0.561476 1021.0 110.401567 ... 120.0 120.0 1021.0 18.461071 1.683894e+00 16.844283 16.844283 16.844283 20.213140 20.213140
80135 80136 1021.0 1.098758 2.221534e-16 1.098758 1.098758 1.098758 1.098758 1.098758 1021.0 170.401567 ... 180.0 180.0 1021.0 23.293455 1.373012e+00 21.975161 21.975161 21.975161 24.722056 24.722056
80136 80137 999.0 0.963716 1.110779e-16 0.963716 0.963716 0.963716 0.963716 0.963716 999.0 110.390390 ... 120.0 120.0 999.0 31.689751 2.890390e+00 28.911472 28.911472 28.911472 34.693766 34.693766
80137 80138 999.0 0.725943 1.110779e-16 0.725943 0.725943 0.725943 0.725943 0.725943 999.0 50.390390 ... 60.0 60.0 999.0 54.020643 1.088630e+01 43.556596 43.556596 43.556596 65.334894 65.334894
80138 80139 999.0 0.892721 0.000000e+00 0.892721 0.892721 0.892721 0.892721 0.892721 999.0 240.000000 ... 240.0 240.0 999.0 13.390820 3.554493e-15 13.390820 13.390820 13.390820 13.390820 13.390820

And finally we plot the route where the width of the line is the speed between each stop pair. The Expo Line really is much slower into downtown than along the rest of the line.

foliumMap = folium.Map(location=[34, -118], zoom_start=9)
segmentsDF.apply(plotSegment, axis=1)
foliumMap

This script is also available as a Jupyter Notebook.


|

Reply to this article.


 


Email Me Photography Delicious Facebook Flickr 43 Things Github Instagram LinkedIn SummitPost Twitter Yelp