Creating a Four-step Transportation Model in Python
A four-step transportation model predicts the traffic load on a network given data about a region. These models are used to evaluate the impacts of land-use and transportation projects. In this example, we will create a model representing California as if it acted as a city. To get started, first we will import the necessary libraries. All of these can be installed from pip.
import requests
import pandas
import geopandas
import json
import math
from haversine import haversine
from ipfn import ipfn
import networkx
from matplotlib import pyplot
from matplotlib import patheffects
Step 0 Gather Data
First we need some data about the study area. Supply data is the transportation network including roads, public transportation schedules, etc. To keep things simple, we are going to assume the transport network is a line connecting the centroid of each zone to the centroid of each other zone.
Supply Data
url = 'https://tigerweb.geo.census.gov/arcgis/rest/services/TIGERweb/State_County/MapServer/37/query?where=state%3D06&f=geojson'
r = requests.get(url)
zones = geopandas.GeoDataFrame.from_features(r.json()['features'])
centroidFunction = lambda row: (row['geometry'].centroid.y, row['geometry'].centroid.x)
zones['centroid'] = zones.apply(centroidFunction, axis=1)
At this point we can plot our zones and see how they look:
zones.plot()
pyplot.show()
Demand Data
Demand dats is the users of the transportation network. For passenger models, demand data is typically census data including residential locations, work locations, school location, etc. For freight models, demand data could be tons of freight, number of bananas, etc. In this case we will study workers' home locations (from the 2015 American Community Survey (ACS) 5-Year Data) and employees' locations (from the Bureau of Labor Statistics (BLS)). (The BLS API seems to be quite slow...)
url = 'http://api.census.gov/data/2015/acs5/profile?get=NAME,DP03_0018E&for=county&in=state:06'
r = requests.get(url)
Production = pandas.DataFrame(r.json()[1:], columns = r.json()[0], dtype='int')
nameSplit = lambda x: x.split(',')[0]
Production['NAME'] = Production['NAME'].apply(nameSplit)
zones = pandas.merge(zones, Production)
zones['Production'] = zones['DP03_0018E']
def getEmployment(state, county):
  prefix = 'EN'
  seasonal_adjustment = 'U'
  area = format(state, "02d") + format(county, "03d")
  data_type = '1'
  size = '0'
  ownership = '0'
  industry = '10'
  seriesid = prefix + seasonal_adjustment + area + data_type + size + ownership + industry
  headers = {'Content-type': 'application/json'}
  data = json.dumps({"seriesid": [seriesid],"startyear":"2015", "endyear":"2015", "registrationKey": ""})
  p = requests.post('https://api.bls.gov/publicAPI/v2/timeseries/data/', data=data, headers=headers)
  employment = p.json()['Results']['series'][0]['data'][0]['value']
  return(employment)
employment = lambda row: int(getEmployment(row['state'], row['county']))
zones['Attraction'] = zones.transpose().apply(employment)
zones['Production'] = zones['Production'] * zones.sum()['Attraction'] / zones.sum()['Production']
zones.index = zones.NAME
zones.sort_index(inplace=True)
The second to last line makes sure the sums of Production and Attraction are equal. Iterative Proportional Fitting in Trip Distribution will fail if they are not. We assume people with multiple jobs are spread thoughout the study area. Now let's take a look at where our commuters live and work:
pandas.set_option('display.float_format', lambda x: '%.0f' % x)
zones[['Production', 'Attraction']].head()
| Production | Attraction | |
|---|---|---|
| Alameda County | 691588 | 741843 | 
| Alpine County | 380 | 531 | 
| Amador County | 11425 | 11500 | 
| Butte County | 83232 | 79361 | 
| Calaveras County | 15044 | 8888 | 
We can easily see that Alameda and Alpine Counties see an influx of commuters during the day and Butte and Calaveras Counties are the opposite.
Step 1 Trip Generation
Trip Generation is where we compute the numbers for Production and Attraction. We completed this above.
Step 2 Trip Distribution
In Trip Distribution we use a Gravity Model to calculate a cost matrix representing the cost of travel between each pair of zones. First, we create a simple cost function. Then we use that function to calculate our cost matrix by interating through all possible zone pairs. Then we can use our cost matrix to distribute our trips across our study area. Changing the beta parameter adjusts the Friction of Distance. Beta will vary based on the units of distance. We use a Haversine Function to calculate distances in kilometers (or miles) from geographic coordinates. The Trip Distribution function uses Iterative Proportional Fitting to assign trips from our Production and Attraction arrays to our matrix.
def costFunction(zones, zone1, zone2, beta):
  cost = math.exp(-beta * haversine(zones[zone1]['centroid'], zones[zone2]['centroid']))
  return(cost)
def costMatrixGenerator(zones, costFunction, beta):
  originList = []
  for originZone in zones:
    destinationList = []
    for destinationZone in zones:
        destinationList.append(costFunction(zones, originZone, destinationZone, beta))
    originList.append(destinationList)
  return(pandas.DataFrame(originList, index=zones.columns, columns=zones.columns))
def tripDistribution(tripGeneration, costMatrix):
  costMatrix['ozone'] = costMatrix.columns
  costMatrix = costMatrix.melt(id_vars=['ozone'])
  costMatrix.columns = ['ozone', 'dzone', 'total']
  production = tripGeneration['Production']
  production.index.name = 'ozone'
  attraction = tripGeneration['Attraction']
  attraction.index.name = 'dzone'
  aggregates = [production, attraction]
  dimensions = [['ozone'], ['dzone']]
  IPF = ipfn.ipfn(costMatrix, aggregates, dimensions)
  trips = IPF.iteration()
  return(trips.pivot(index='ozone', columns='dzone', values='total'))
beta = 0.01
costMatrix = costMatrixGenerator(zones.transpose(), costFunction, beta)
trips = tripDistribution(zones, costMatrix)
If we take a look at the trips table we can see that most trips stay inside each county, but some go quite far. Origin zones are on the left. Destination zones are on the top.
| Alameda County | Alpine County | Amador County | Butte County | Calaveras County | Colusa County | Contra Costa County | Del Norte County | El Dorado County | Fresno County | Glenn County | Humboldt County | Imperial County | Inyo County | Kern County | Kings County | Lake County | Lassen County | Los Angeles County | Madera County | Marin County | Mariposa County | Mendocino County | Merced County | Modoc County | Mono County | Monterey County | Napa County | Nevada County | Orange County | Placer County | Plumas County | Riverside County | Sacramento County | San Benito County | San Bernardino County | San Diego County | San Francisco County | San Joaquin County | San Luis Obispo County | San Mateo County | Santa Barbara County | Santa Clara County | Santa Cruz County | Shasta County | Sierra County | Siskiyou County | Solano County | Sonoma County | Stanislaus County | Sutter County | Tehama County | Trinity County | Tulare County | Tuolumne County | Ventura County | Yolo County | Yuba County | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Alameda County | 128707 | 17 | 667 | 6631 | 487 | 853 | 53313 | 352 | 2785 | 6306 | 794 | 2225 | 8 | 12 | 1046 | 866 | 1392 | 540 | 4519 | 1066 | 11156 | 146 | 2261 | 4466 | 128 | 87 | 14127 | 8389 | 1818 | 1032 | 8716 | 343 | 96 | 56378 | 1188 | 151 | 336 | 70696 | 21329 | 3870 | 50131 | 3160 | 131790 | 13520 | 4663 | 26 | 820 | 17628 | 17322 | 13558 | 2803 | 1417 | 179 | 829 | 571 | 1412 | 11066 | 1337 | 
| Alpine County | 18 | 0 | 2 | 4 | 2 | 0 | 9 | 0 | 11 | 46 | 0 | 1 | 0 | 0 | 8 | 4 | 0 | 2 | 33 | 9 | 2 | 1 | 0 | 8 | 0 | 1 | 8 | 2 | 5 | 8 | 23 | 1 | 1 | 35 | 1 | 2 | 3 | 10 | 13 | 7 | 7 | 8 | 31 | 2 | 3 | 0 | 0 | 4 | 3 | 14 | 1 | 1 | 0 | 8 | 6 | 7 | 3 | 1 | 
| Amador County | 839 | 2 | 84 | 184 | 55 | 11 | 412 | 5 | 339 | 551 | 10 | 24 | 1 | 1 | 85 | 60 | 13 | 43 | 342 | 102 | 80 | 15 | 21 | 244 | 9 | 10 | 273 | 74 | 155 | 80 | 836 | 24 | 9 | 1649 | 33 | 15 | 28 | 480 | 605 | 157 | 327 | 142 | 1379 | 102 | 109 | 3 | 17 | 173 | 136 | 574 | 56 | 24 | 2 | 76 | 65 | 89 | 144 | 48 | 
| Butte County | 7901 | 5 | 174 | 4781 | 115 | 220 | 4434 | 117 | 856 | 1141 | 246 | 594 | 2 | 3 | 175 | 130 | 276 | 316 | 708 | 211 | 1239 | 31 | 454 | 650 | 86 | 23 | 1150 | 1210 | 730 | 165 | 3094 | 186 | 19 | 12193 | 118 | 32 | 57 | 6729 | 3221 | 413 | 3669 | 347 | 9446 | 830 | 2749 | 11 | 421 | 2163 | 2478 | 1979 | 938 | 631 | 59 | 157 | 138 | 191 | 2204 | 610 | 
| Calaveras County | 1066 | 3 | 96 | 211 | 90 | 12 | 514 | 5 | 421 | 888 | 12 | 28 | 1 | 2 | 136 | 97 | 16 | 51 | 551 | 164 | 98 | 24 | 24 | 372 | 10 | 15 | 394 | 89 | 180 | 128 | 967 | 28 | 14 | 1991 | 48 | 23 | 44 | 598 | 772 | 247 | 415 | 228 | 1818 | 134 | 125 | 3 | 19 | 212 | 165 | 797 | 66 | 28 | 3 | 121 | 102 | 144 | 173 | 55 | 
Step 3 Mode Choice
At this point we have a matrix of the number of trips from each zone to each zone. Next we split those trips across the available modes, in this case walking, cycling, and driving. For this we create a Utility Function that describes the utility gained from the trip minus the utility lost due to travel time, cost, and other negative factors associated with the mode. We then use this utility function to determine the probability of taking each mode for each zone pair. Similar to Trip Distribution, we use these probabilities to compute a matrix. We can then multiply our trip matrix by the probability matrices to get the number of trips between each zone pair using a given mode.
def modeChoiceFunction(zones, zone1, zone2, modes):
  distance = haversine(zones[zone1]['centroid'], zones[zone2]['centroid'])
  probability = {}
  total = 0.0
  for mode in modes:
    total = total + math.exp(modes[mode] * distance)
  for mode in modes:
    probability[mode] = math.exp(modes[mode] * distance) / total
  return(probability)
def probabilityMatrixGenerator (zones, modeChoiceFunction, modes):
  probabilityMatrix = {}
  for mode in modes:
    originList = []
    for originZone in zones:
      destinationList = []
      for destinationZone in zones:
	  destinationList.append(modeChoiceFunction(zones, originZone, destinationZone, modes)[mode])
      originList.append(destinationList)
    probabilityMatrix[mode] = pandas.DataFrame(originList, index=zones.columns, columns=zones.columns)
  return(probabilityMatrix)
modes = {'walking': .05, 'cycling': .05, 'driving': .05}
probabilityMatrix = probabilityMatrixGenerator(zones.transpose(), modeChoiceFunction, modes)
drivingTrips = trips * probabilityMatrix['driving']
Now we can look at the number of driving trips between each zone pair.
| Alameda County | Alpine County | Amador County | Butte County | Calaveras County | Colusa County | Contra Costa County | Del Norte County | El Dorado County | Fresno County | Glenn County | Humboldt County | Imperial County | Inyo County | Kern County | Kings County | Lake County | Lassen County | Los Angeles County | Madera County | Marin County | Mariposa County | Mendocino County | Merced County | Modoc County | Mono County | Monterey County | Napa County | Nevada County | Orange County | Placer County | Plumas County | Riverside County | Sacramento County | San Benito County | San Bernardino County | San Diego County | San Francisco County | San Joaquin County | San Luis Obispo County | San Mateo County | Santa Barbara County | Santa Clara County | Santa Cruz County | Shasta County | Sierra County | Siskiyou County | Solano County | Sonoma County | Stanislaus County | Sutter County | Tehama County | Trinity County | Tulare County | Tuolumne County | Ventura County | Yolo County | Yuba County | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Alameda County | 42902 | 6 | 222 | 2210 | 162 | 284 | 17771 | 117 | 928 | 2102 | 265 | 742 | 3 | 4 | 349 | 289 | 464 | 180 | 1506 | 355 | 3719 | 49 | 754 | 1489 | 43 | 29 | 4709 | 2796 | 606 | 344 | 2905 | 114 | 32 | 18793 | 396 | 50 | 112 | 23565 | 7110 | 1290 | 16710 | 1053 | 43930 | 4507 | 1554 | 9 | 273 | 5876 | 5774 | 4519 | 934 | 472 | 60 | 276 | 190 | 471 | 3689 | 446 | 
| Alpine County | 6 | 0 | 1 | 1 | 1 | 0 | 3 | 0 | 4 | 15 | 0 | 0 | 0 | 0 | 3 | 1 | 0 | 1 | 11 | 3 | 1 | 0 | 0 | 3 | 0 | 0 | 3 | 1 | 2 | 3 | 8 | 0 | 0 | 12 | 0 | 1 | 1 | 3 | 4 | 2 | 2 | 3 | 10 | 1 | 1 | 0 | 0 | 1 | 1 | 5 | 0 | 0 | 0 | 3 | 2 | 2 | 1 | 0 | 
| Amador County | 280 | 1 | 28 | 61 | 18 | 4 | 137 | 2 | 113 | 184 | 3 | 8 | 0 | 0 | 28 | 20 | 4 | 14 | 114 | 34 | 27 | 5 | 7 | 81 | 3 | 3 | 91 | 25 | 52 | 27 | 279 | 8 | 3 | 550 | 11 | 5 | 9 | 160 | 202 | 52 | 109 | 47 | 460 | 34 | 36 | 1 | 6 | 58 | 45 | 191 | 19 | 8 | 1 | 25 | 22 | 30 | 48 | 16 | 
| Butte County | 2634 | 2 | 58 | 1594 | 38 | 73 | 1478 | 39 | 285 | 380 | 82 | 198 | 1 | 1 | 58 | 43 | 92 | 105 | 236 | 70 | 413 | 10 | 151 | 217 | 29 | 8 | 383 | 403 | 243 | 55 | 1031 | 62 | 6 | 4064 | 39 | 11 | 19 | 2243 | 1074 | 138 | 1223 | 116 | 3149 | 277 | 916 | 4 | 140 | 721 | 826 | 660 | 313 | 210 | 20 | 52 | 46 | 64 | 735 | 203 | 
| Calaveras County | 355 | 1 | 32 | 70 | 30 | 4 | 171 | 2 | 140 | 296 | 4 | 9 | 0 | 1 | 45 | 32 | 5 | 17 | 184 | 55 | 33 | 8 | 8 | 124 | 3 | 5 | 131 | 30 | 60 | 43 | 322 | 9 | 5 | 664 | 16 | 8 | 15 | 199 | 257 | 82 | 138 | 76 | 606 | 45 | 42 | 1 | 6 | 71 | 55 | 266 | 22 | 9 | 1 | 40 | 34 | 48 | 58 | 18 | 
Step 4 Route Assignment
At this point we have a matrix of all trips from each zone to each zone by mode. We could use this information to calculate mode share percentages. However, we would also like to see how the trips look on the transportation network. For that, we create a graph to represent the network. Then we calculate the shortest path for each trip and add all the trips to the network ignoring capacity contraints. Last, we can visualize our trips and see how the traffic is distributed. The width of the line between centroids show the volume of traffic.
def routeAssignment(zones, trips):
  G = networkx.Graph()
  G.add_nodes_from(zones.columns)
  for zone1 in zones:
    for zone2 in zones:
      if zones[zone1]['geometry'].touches(zones[zone2]['geometry']):
        G.add_edge(zone1, zone2, distance = haversine(zones[zone1]['centroid'], zones[zone2]['centroid']), volume=0.0)
  for origin in trips:
    for destination in trips:
      path = networkx.shortest_path(G, origin, destination)
      for i in range(len(path) - 1):
        G[path[i]][path[i + 1]]['volume'] = G[path[i]][path[i + 1]]['volume'] + trips[zone1][zone2]
  return(G)
def visualize(G, zones):
  fig = pyplot.figure(1, figsize=(10, 10), dpi=90)
  ax = fig.add_subplot(111)
  zonesT = zones.transpose()
  zonesT.plot(ax = ax)
  for i, row in zones.transpose().iterrows():
    text = pyplot.annotate(s=row['NAME'], xy=((row['centroid'][1], row['centroid'][0])), horizontalalignment='center', fontsize=6)
    text.set_path_effects([patheffects.Stroke(linewidth=3, foreground='white'), patheffects.Normal()])
  for zone1 in G.edge:
    for zone2 in G.edge[zone1]:
      volume = G.edge[zone1][zone2]['volume']
      x = [zones[zone1]['centroid'][1], zones[zone2]['centroid'][1]]
      y = [zones[zone1]['centroid'][0], zones[zone2]['centroid'][0]]
      ax.plot(x, y, color='#444444', linewidth=volume/10000, solid_capstyle='round', zorder=1)
  pyplot.show(block=False)
G = routeAssignment(zones.transpose(), drivingTrips)
visualize(G, zones.transpose())
Here are our trips:
 
Obviously the scale of this example is quite ridiculous. Because the cost of travel is so low, our model is telling us that there will be many long distance trips. And because we used centroid-to-centroid routes, there is no concept of geography. This makes the route through the east of the state the fastest path north to south. However, this is the simple method used by transportation planners around the world to predict travel patterns. If you'd like to play with the parameters, here are all the functions:
# Trip Distribution
beta = 0.001
costMatrix = costMatrixGenerator(zones.transpose(), costFunction, beta)
trips = tripDistribution(zones, costMatrix)
# Mode Choice
modes = {'walking': .05, 'cycling': .05, 'driving': .05}
probabilityMatrix = probabilityMatrixGenerator(zones.transpose(), modeChoiceFunction, modes)
drivingTrips = trips * probabilityMatrix['driving']
# Route Assignment
G = routeAssignment(zones.transpose(), drivingTrips)
visualize(G, zones.transpose())
That's all folks. Questions? Improvements?













Reply to this article.