Does Increasing the Number of Cyclists
Reduce the Accident Rate?
and a similar consideration of
Smeed's Law

Review of:

Safety in Numbers: More Walkers and Bicyclists,
Safer Walking and Bicycling
by Peter Lyndon Jacobsen; Injury Prevention; 2003;9:205-209

Review by John Forester

home                social

Two different types of error: Arguing from correlation, and Mathematical Artifact

Jacobsen presents his argument as arguing that correlations of data lead to causal conclusions. That is, because X and Y are commonly observed together, X causes Y, where this relationship is the one that fits a preconceived hypothesis. This is a major error, described by the standard caution in statistics "Correlation does not demonstrate causality." Jacobsen makes another error in the way that he presents and plots his data, because any data at all, including random data, if plotted as he does with Accidents/Cyclist plotted against Cyclists/Population will produce the same quasi-hyperbolic or decreasing power pattern. These two errors will be discussed separately.

Jacobsen states his conclusions:

"A motorist is less likely to collide with a person walking and bicycling when there are more people walking or bicycling. Modeling this relationship as a power curve yields the result that at the population level, the number of motorists colliding with people walking or bicycling will increase at roughly 0.4 power of the number of people walking or bicycling. For example, a community doubling its walking can expect a 32% increase in injuries (2^0.4 = 1.32). Taking into account the amount of walking and bicycling, the probability that a motorist will strike an individual person walking or bicycling declines with the roughly -0.6 power of the number of persons walking or bicycling. An individual's risk while walking in a community with twice as much walking will reduce to 66% (2^0.4/2 = 2^-0.6 = 0.66). Accordingly, policies that increase the numbers of people walking and bicycling appear to be an effective route to improving the safety of people walking and bicycling."

I confine my remarks to the bicycling aspects, with which I am more familiar, although the same basic criticism applies to Jacobsen's method as applied to walking.

Correlation does not demonstrate Causation

That statement of conclusions leads one to believe that there had been a demonstration that increasing the volume of bicycling at some location had produced a less-than-proportionate increase in the number of car-bike collisions. In fact, Jacobsen does not report on any such event; I know of no such report anywhere. All that Jacobsen has investigated are the accident rates in different areas or different times with differing amounts of bicycling. The most that he can show are correlations between the two sets of data, because he makes no investigation into any causal relationship.

Jacobsen's Data

Jacobsen compares six sets of data which fall into three classes, presenting them as graphs as well as a table.

Proportion of bicycling trips to work [abscissa] against (Injuries/Population)/(Bike trips/Total Trips) [ordinate]. 68 Cities in California.

Amount of bicycling (Km or trips/population/day) [abscissa] against (Injuries or fatalities/Km) [ordinate]. Bicycling in 47 Danish towns, 14 European nations, 8 European nations.

Amount of bicycling (Km/year) [abscissa] against (fatalities/Km) [ordinate] for UK 1950-1999, Netherlands 1980-1998.

Each of these six graphs shows the accident rate declining as the amount of bicycling increases.

The California Cities Data

Consider the 68 California cities. I don't know what cities Jacobsen chose, but we can compare representative cities, say Palo Alto as a city with high cycling modal split and Bakersfield as a city with low cycling modal split. In short, some cities are more suited to bicycle commuting than are others, and those that are more suited to bicycle commuting have lower bicycling accident rates. Jacobsen claims that the most significant force for reducing the accident rate is the bicycling modal split, such that increasing the bicycling modal split will reduce the accident rate. Some city characteristics enable easier bicycle commuting: Palo Alto has an equable climate while Bakersfield is where they boast of frying eggs on the sidewalk. Other city characteristics produce more bicycle commuting: high-quality universities, high-tech industries, crowded geographical space, large and dense office employment, government offices, varied cultural life, and others. Many of these characteristics contribute to both bicycle commuting and to lower car-bike collision rates.

If one could change Bakersfield to be like Palo Alto, then I think it possible that the bicycling modal split and the car-bike collision rate in Bakersfield would change to be similar to those in Palo Alto. However, that is impossible. Recreate San Francisco twenty miles north of Bakersfield? Recreate Stanford, with its scientific history? Replace the current Bakersfield residents, or change their social attitudes? Constrict the Central Valley to the scope of the Bayshore strip? Blow cool ocean air from the Golden Gate to Bakersfield? With enormous effort, you might get something like Austin, Texas.

Now consider increasing the bicycle modal split in Bakersfield by some form of fiat, without changing the other characteristics of Bakersfield. I offer the favorite change proposed by people such as Jacobsen; production of bike lanes. Such might increase the very small bicycling modal split in Bakersfield. Would such change the social attitudes of Bakersfieldians to lower the car-bike collision rate? That hasn't been shown anywhere, and there is no reason to suspect that it would there. Assuming that the level of cyclist skill was reduced by the addition of more inexperienced cyclists, the rate might well increase.

The point of this discussion is that there are many interlinked characteristics that produce both the bicycle modal split and the car-bike collision rate, and nobody has been able to isolate the effects of any one. Jacobsen has simply chosen the characteristic that he desires.

The Other Comparisons

The same argument applies to all the comparisons of different areas in Jacobsen's samples, with more besides, because there are greater systematic differences between European nations than there are between California cities, which all have much the same transportation system. Some nations have sidepath systems, some do not; some have dense cities, some have less dense cities; some have higher incomes than others. All of these affect the data.

An almost identical argument applies to the two time-series that Jacobsen uses. Consider the UK series from 1950 to 1999. In 1950, Britain was barely motorized, with 25% of trips being by bicycle in 1952. Popular motorization started about 1962 (I was there.). Motorization produced enormous social and geographical change, including a reduction in cycling skills. Nobody has been able to isolate the effects of any of these changes. But look at Jacobsen's graph. The cyclist fatality rate climbs from 1950 to 1972 as bicycling transportation declines, and then decreases back to near its original rate without any significant change in miles bicycled. The graph itself disproves Jacobsen's hypothesis. The Netherlands graph shows the opposite, with about a 30% increase in cycling distance from 1980 to 1998, with about a 75% reduction in fatalities. However, there is no explanation of  why these changes occurred or how they are related and whether they could be exported to other nations.

Standard Correlation Caution

It must always be remembered that correlation does not demonstrate causation. Particularly in complicated social systems, such as bicycle transportation, which is a system within the surface transportation system, which is within systems of geographical constraints and social conditions and historical facts, it is very difficult to separate out the effects of any one factor. To do so, requires a strong independent demonstration of the causal link, which the correlation data can then only support or disprove, never prove. Jacobsen has not demonstrated any causal link, has not even tried to do so. He has simply chosen to assert that the most significant factor is the one that suits his agenda, nothing more than that.

The quasi-hyperbolic or decreasing power plot is a Mathematical Artifact

Each of Jacobsen's plots is of the form N/C vs C/P, where N = number of accidents, C = number of cyclists, P = size of population. Note that C, the number of cyclists, appears as the denominator of N/C and also appears as the numerator of C/P. Jacobsen sometimes uses number of trips or number of miles travelled instead of number of cyclists, but in each case the mathematical pattern is the same, with the variable representing the amount of cycling appearing in both ratios, once as denominator and once as numerator.

The plot of any such data presented in this form will produce a quasi-hyperbolic or decreasing-power pattern that suggests, to the ill-informed, a decrease of accidents with increase of cyclists according to a power of less than one. (Jacobsen found 0.4 to be the power.) The reason is that C, the number of cyclists, appears in both N/C and C/P. Therefore, when C is large, then N/C tends to be small and C/P tends to be large, and when C is small, then N/C tends to be large while C/P tends to be small. I do not know if a mathematical proof of this has been made, but I present a demonstration of this effect using purely random data that produce the quasi-hyperbolic plot pattern.

Using the Quattro Pro spreadsheet program, I developed 300 random 4-digit numbers by running down the column of the last four digits of the telephone numbers in a standard telephone book with the numbers arranged in the alphabetical order of subscribers' names. I assert that these numbers are sufficiently random for our purposes. (The telephone company might not use a few particular numbers for some reason or another, but the absence of a few from the population of 9,999 is insufficient to destroy randomness beyond what is needed here.) I arranged these in three columns of 100 each, forming 100 sets of triplets of random numbers. I then labeled the first column to represent the population size, the second column to represent the number of cyclists, the third column to represent the number of accidents. I then set up two more columns with values calculated from those in the first three columns. One column is Cyclists/Population, the second column is Accidents/Cyclist. I then plotted the calculated values as Accidents/Cyclist the ordinate or Y axis, and Cyclists/Population as the abscissa or X axis, just as Jacobsen has done for his plots.

The plotted result exhibits the quasi-hyperbolic or decreasing power pattern, as shown below. Note that the plotted points that are off the scale of the plotting diagram are all close to either the X axis or close to the Y axis, with only one exception, the point at 7.5, 10.9. Ninety-nine out of one hundred points are inside the expected envelope of a decreasing power function.

 

The fact that purely random data, organized in the same pattern as Jacobsen used, produce this quasi-hyperbolic or decreasing power pattern demonstrates that this pattern is an artifact of the method of presentation and is not a characteristic of the original information. Therefore, Jacobsen's hypothesis is not supported by any data and is a mathematical artifact of his method of presentation. Since Jacobsen's hypothesis is derived from what is purely a mathematical artifact one should not give it any credence at this time, and not until some other evidence of a different nature is discovered that demonstrates it without this mathematical defect.

Similarity to Smeed's Law and Comparison Thereof

Jacobsen copied his technique of analysis from Smeed's Law that has been considered for many decades as providing a reasonably accurate description of the relationship between traffic deaths or other accidents and the degree of motorization of a society. In the original presentation, Smeed plotted Deaths/Vehicle against Vehicles/Population Size. Others have since adapted Smeed's Law to slightly different variables, such as distance traveled by motor vehicles instead of number of motor vehicles, with similar results. Note that in each case, the same variable appears as both the denominator of one ratio that is plotted against another ration which has that same variable as the numerator. Thus, Smeed's Law suffers the same defect as does Jacobsen's Hypothesis, in that its results would be produced by even random data, as is demonstrated for Jacobsen's Hypothesis. Therefore, Smeed's Law has no more validity than does Jacobsen's Hypothesis.

The data sheets that are the printout from the spreadsheet are below:

 

Table page 1 

 

Table page 2 

You may download the spreadsheet file in Quattro Pro format

Return to: John Forester's Home Page                              Up: Social