IntroductionA transitscore is a measure of the level of the availability of public transit at a particular location, on a scale of 0 to 100, where 0 is absolutely no public transit and 100 is the highest level of public transit availability. The cores of world-class cities like San Francisco, New York, or London would have transitscores of 100. The transitscore algorithm we've developed calculates a score for a specific point by summing the relative usefulnesses of nearby routes, where the usefulness is a function of a route's distance and the level of service that route represents. The algorithm is therefore divided into two components - the function for determining the usefulness of a route as a function of the distance to the nearest stop, and a function for ranking the level of service of a route. The distance tailoff function we've chosen is a simple, smooth exponential function derived from a walkability study. This causes transitscores to smoothly taper off as you move away from a transit amenity. The level of service of a route is proportional to the number of trips that run on that route over the course of a week times a desirability coefficient associated with the route's mode. Rail, for instance, is consistently chosen over bus transit even when controlling for trip frequency. The resulting algorithm is simple to implement against data feeds formatted in the Google Transit Feed Specification (GTFS), enabling consistent scoring of all transit agencies that publish in GTFS format. Pseudocode AlgorithmWhereas any quantification of transit infrastructure (number of stops, number of weekly trips, etc) will have its own unique range, it is necessary to normalize the raw transitscore to generate the final transitscore: transit_score(location) = normalize_to_100( log( raw_transitscore(location) ), low=0, high=highest_transitscore ) The amount of transit infrastructure can vary by several orders of magnitude. Scales for measuring things that have an extremely large range of normal values (sound volume, earthquake intensity, etc) are typically logarithmic - a bus stop in a small town might see three trips a day, whereas downtown Manhattan might see tens of thousands. If Manhattan had a transitscore of 100, then on a linear scale a small town's downtown might have a transitscore of 0.01, whereas a logarithmic score might rate Manhattan as 100 and a small town as 10. The logarithmic score matches a rider's experience better: the added utility of one additional bus in a small town may exceed the addition of 10 new routes in downtown manhattan. raw_transitscore(location) = sum( [relative_route_usefulness(location, route) for each route] ) relative_route_usefulness(location, route) = distance_penalty( route_distance(location, route) ) * route_service_level(route) route_distance(location, route) = distance to the nearest stop that serves a given route The distance penalty is coefficient that scales a value in proportion to how useful that thing is given how far away it is. In the literature this coefficient is sometimes called the "impedance", "friction factor", "utility function", "propensity function", or "gravity function". Typically the distance penalty starts at 1 for 0 distance and diminishes as distance (or travel time) increases. Papers available here and here provide a few impedance functions derived from real-world commuter behavior for automobile and public transit trips. A reasonable impedance function of 0.486*exp(-1.683x) for walking travel with respect to distance in kilometers comes from Iacono et al. Normalize f(0)=1 and using miles: distance_penalty(d) = exp(-1.683*(d*1.609344)) Note that since the distance_penalty drops to very near 0 after a few miles, it is possible to calculate the raw transitscore as the sum of the relative stop usefulness of every route in the city - indeed, in the whole world, if you wanted. In practice since far-away routes have little impact on a location's transitscore, you don't need to include them. route_service_level(route) = mode_preference( mode(route) )*num_trips_per_week(route) The mode preference represents the relative attractiveness of different types of transit service by virtue of their mode alone. The mode preference values we've arbitrarily chosen are: light rail, subway, rail:2 bus:1 ferry, cable car, gondola, funicular:1.5 MapsBay AreaCodeA reference implementation of Transitscore is available on github. |


