Quantifying the causal effect of speed cameras on road traffic accidents   via an approximate Bayesian doubly robust estimator

Daniel J Graham; Cian Naik; Emma J McCoy; Haojie Li

arXiv:1703.05926·stat.AP·August 20, 2019

Quantifying the causal effect of speed cameras on road traffic accidents via an approximate Bayesian doubly robust estimator

Daniel J Graham, Cian Naik, Emma J McCoy, Haojie Li

PDF

Open Access

TL;DR

This paper introduces an approximate Bayesian doubly-robust method to estimate the causal impact of speed cameras on traffic accidents, addressing previous methodological limitations and providing evidence of safety benefits in England.

Contribution

It develops a novel Bayesian DR approach for causal inference, combining propensity scores and outcome models, to evaluate transport safety interventions.

Findings

01

Speed cameras reduce collisions by approximately 15%.

02

The method provides a statistically robust estimate of causal effects.

03

Results support the effectiveness of speed cameras in improving road safety.

Abstract

This paper quantifies the effect of speed cameras on road traffic collisions using an approximate Bayesian doubly-robust (DR) causal inference estimation method. Previous empirical work on this topic, which shows a diverse range of estimated effects, is based largely on outcome regression (OR) models using the Empirical Bayes approach or on simple before and after comparisons. Issues of causality and confounding have received little formal attention. A causal DR approach combines propensity score (PS) and OR models to give an average treatment effect (ATE) estimator that is consistent and asymptotically normal under correct specification of either of the two component models. We develop this approach within a novel approximate Bayesian framework to derive posterior predictive distributions for the ATE of speed cameras on road traffic collisions. Our results for England indicate…

Tables2

Table 1. Table 1 : Simulation results for posterior predictive distributions ( τ = 5.0 𝜏 5.0 \tau=5.0 ).

	Av. Est.	Emp. Var.	MSE
BOR1	5.004	0.036	0.036
BOR2	5.350	0.036	0.157
PS1	4.998	0.118	0.119
PS2	276.740	1.84E+07	1.85E+07
BDR1	5.008	0.046	0.046
BDR2	5.018	0.862	0.862
BDR3	5.360	0.946	1.074

Table 2. Table 2 : Bayesian and Frequentist bootstrapped estimates of the average treatment effect

	Bayesian bootstrap			Frequentist bootstrap
	posterior mean	s.d.	95% cred. int.	Est.	s.e.
OR	-14.429	3.536	(-21.473, -7.350)	-14.825	3.419
IPW	-15.476	5.480	(-26.367, -4.808)	-15.504	4.935
DR	-14.359	3.605	(-21.841, -7.352)	-14.370	3.494
Naïve (matched sample)	-17.617	5.199	(-28.214, -7.800)	-17.887	5.118
Naïve (full sample)	-33.684	5.911	(-42.133, -13.624)	-34.682	3.573

Equations52

τ = E [Y_{i} (1)] - E [Y_{i} (0)],

τ = E [Y_{i} (1)] - E [Y_{i} (0)],

τ =

τ =

=

=

f_{Z} (z) = f_{Y ∣ D, X} (y ∣ d, x) f_{D ∣ X} (d ∣ x) f_{X} (x) .

f_{Z} (z) = f_{Y ∣ D, X} (y ∣ d, x) f_{D ∣ X} (d ∣ x) f_{X} (x) .

\overset{τ}{^}_{O R} = \frac{1}{n} i = 1 \sum n [Ψ^{- 1} {m (1, X_{i}; \hat{β})} - Ψ^{- 1} {m (0, X_{i}; \hat{β})}] .

\overset{τ}{^}_{O R} = \frac{1}{n} i = 1 \sum n [Ψ^{- 1} {m (1, X_{i}; \hat{β})} - Ψ^{- 1} {m (0, X_{i}; \hat{β})}] .

\overset{τ}{^}_{I P W} = \frac{1}{n} i = 1 \sum n [\frac{I _{1} ( D _{i} ) \cdot Y _{i}}{π ( D _{i} ∣ X _{i} ; α ^ )} - \frac{[ 1 - I _{1} ( D _{i} )] \cdot Y _{i}}{1 - π ( D _{i} ∣ X _{i} ; α ^ )}],

\overset{τ}{^}_{I P W} = \frac{1}{n} i = 1 \sum n [\frac{I _{1} ( D _{i} ) \cdot Y _{i}}{π ( D _{i} ∣ X _{i} ; α ^ )} - \frac{[ 1 - I _{1} ( D _{i} )] \cdot Y _{i}}{1 - π ( D _{i} ∣ X _{i} ; α ^ )}],

e (D_{i}, X_{i}; ξ) = Ψ^{- 1} {m (D_{i}, X_{i}; ξ)}

e (D_{i}, X_{i}; ξ) = Ψ^{- 1} {m (D_{i}, X_{i}; ξ)}

κ_{i} (D_{i}, X_{i}) = \frac{I _{1} ( D _{i} )}{π ( D _{i} ∣ X _{i} ; α )} + \frac{1 - I _{1} ( D _{i} )}{1 - π ( D _{i} ∣ X _{i} ; α )} .

κ_{i} (D_{i}, X_{i}) = \frac{I _{1} ( D _{i} )}{π ( D _{i} ∣ X _{i} ; α )} + \frac{1 - I _{1} ( D _{i} )}{1 - π ( D _{i} ∣ X _{i} ; α )} .

i = 1 \sum n κ_{i} (D_{i}, X_{i}) \frac{1}{ϕ} \frac{\partial e ( d _{i} , x _{i} ; ξ )}{\partial ξ ^{T}} [y_{i} - e (d_{i}, x_{i}; ξ)] = 0,

i = 1 \sum n κ_{i} (D_{i}, X_{i}) \frac{1}{ϕ} \frac{\partial e ( d _{i} , x _{i} ; ξ )}{\partial ξ ^{T}} [y_{i} - e (d_{i}, x_{i}; ξ)] = 0,

\overset{τ}{^}_{D R} = \frac{1}{n} i = 1 \sum n [Ψ^{- 1} {m (1, X_{i}; \hat{ξ})} - Ψ^{- 1} {m (0, X_{i}; \hat{ξ})}] .

\overset{τ}{^}_{D R} = \frac{1}{n} i = 1 \sum n [Ψ^{- 1} {m (1, X_{i}; \hat{ξ})} - Ψ^{- 1} {m (0, X_{i}; \hat{ξ})}] .

π (θ) \propto k = 1 \prod K θ_{k}^{- 1} .

π (θ) \propto k = 1 \prod K θ_{k}^{- 1} .

p (θ ∣ v) \propto k = 1 \prod K θ_{k}^{n_{k} - 1} .

p (θ ∣ v) \propto k = 1 \prod K θ_{k}^{n_{k} - 1} .

L (θ) = i = 1 \prod n f (z_{i}; θ)^{w_{i}},

L (θ) = i = 1 \prod n f (z_{i}; θ)^{w_{i}},

L (θ) = i = 1 \prod n {k = 1 \prod K θ_{k}^{I_{k} (z_{i})}}^{w_{i}} = k = 1 \prod K θ_{k}^{i = 1 \sum n w_{i} I_{k} (z_{i})} = k = 1 \prod K θ_{k}^{n γ_{k}},

L (θ) = i = 1 \prod n {k = 1 \prod K θ_{k}^{I_{k} (z_{i})}}^{w_{i}} = k = 1 \prod K θ_{k}^{i = 1 \sum n w_{i} I_{k} (z_{i})} = k = 1 \prod K θ_{k}^{n γ_{k}},

p (γ) \propto k = 1 \prod K γ_{k}^{n_{k} - 1}

p (γ) \propto k = 1 \prod K γ_{k}^{n_{k} - 1}

e (D_{i}, X_{i}; ξ) = Ψ^{- 1} {m_{A} (D_{i}, X_{i}; ξ)}

e (D_{i}, X_{i}; ξ) = Ψ^{- 1} {m_{A} (D_{i}, X_{i}; ξ)}

w_{i}^{(l)} \cdot κ_{i} (D_{i}, X_{i}) .

w_{i}^{(l)} \cdot κ_{i} (D_{i}, X_{i}) .

i = 1 \sum n w_{i}^{(l)} \cdot κ_{i} (D_{i}, X_{i}) \cdot \frac{1}{ϕ} \frac{\partial e ( d _{i} , x _{i} ; ξ )}{\partial ξ ^{T}} [y_{i} - e (d_{i}, x_{i}); ξ)] = 0,

i = 1 \sum n w_{i}^{(l)} \cdot κ_{i} (D_{i}, X_{i}) \cdot \frac{1}{ϕ} \frac{\partial e ( d _{i} , x _{i} ; ξ )}{\partial ξ ^{T}} [y_{i} - e (d_{i}, x_{i}); ξ)] = 0,

κ_{i} (d_{i}, x_{i}; α) = \frac{I _{1} ( d _{i} )}{π ( d _{i} ∣ x _{i} ; α )} + \frac{1 - I _{1} ( d _{i} )}{1 - π ( d _{i} ∣ x _{i} ; α )} .

κ_{i} (d_{i}, x_{i}; α) = \frac{I _{1} ( d _{i} )}{π ( d _{i} ∣ x _{i} ; α )} + \frac{1 - I _{1} ( d _{i} )}{1 - π ( d _{i} ∣ x _{i} ; α )} .

Ψ^{- 1} {m_{A} (d_{i}, x_{i}; ξ^{(l)})} .

Ψ^{- 1} {m_{A} (d_{i}, x_{i}; ξ^{(l)})} .

τ_{B D R}^{(m)} = \frac{1}{V} v = 1 \sum V [Ψ^{- 1} {m_{A} (1, x_{v}; ξ^{(m)})} - Ψ^{- 1} {m_{A} (0, x_{v}; ξ^{(m)})}] .

τ_{B D R}^{(m)} = \frac{1}{V} v = 1 \sum V [Ψ^{- 1} {m_{A} (1, x_{v}; ξ^{(m)})} - Ψ^{- 1} {m_{A} (0, x_{v}; ξ^{(m)})}] .

X \sim Normal (0, 10)

X \sim Normal (0, 10)

D \sim Bernoulli (expit (α_{0} + α_{1} X))

Y \sim Normal (β_{0} + β_{1} D + β_{2} X, 5)

τ_{B O R} = \frac{1}{L} l = 1 \sum L [\frac{1}{V} v = 1 \sum V [Ψ^{- 1} {m (1, x_{v}; β^{(l)})} - Ψ^{- 1} {m (0, x_{v}; β^{(l)})}]] .

τ_{B O R} = \frac{1}{L} l = 1 \sum L [\frac{1}{V} v = 1 \sum V [Ψ^{- 1} {m (1, x_{v}; β^{(l)})} - Ψ^{- 1} {m (0, x_{v}; β^{(l)})}]] .

τ_{P S 1} = \frac{1}{L} l = 1 \sum L [\frac{1}{V} v = 1 \sum V [y_{v} \cdot \frac{d _{v} - π ( d _{v} ∣ x _{v} ; α ^{(l)} )}{π ( d _{v} ∣ x _{v} ; α ^{(l)} ) ( 1 - π ( d _{v} ∣ x _{v} ; α ^{(l)} ))}]]

τ_{P S 1} = \frac{1}{L} l = 1 \sum L [\frac{1}{V} v = 1 \sum V [y_{v} \cdot \frac{d _{v} - π ( d _{v} ∣ x _{v} ; α ^{(l)} )}{π ( d _{v} ∣ x _{v} ; α ^{(l)} ) ( 1 - π ( d _{v} ∣ x _{v} ; α ^{(l)} ))}]]

τ_{B D R} = \frac{1}{L} l = 1 \sum L [\frac{1}{V} v = 1 \sum V [Ψ^{- 1} {m_{A} (1, x_{v}; ξ^{(l)})} - Ψ^{- 1} {m_{A} (0, x_{v}; ξ^{(l)})}]] .

τ_{B D R} = \frac{1}{L} l = 1 \sum L [\frac{1}{V} v = 1 \sum V [Ψ^{- 1} {m_{A} (1, x_{v}; ξ^{(l)})} - Ψ^{- 1} {m_{A} (0, x_{v}; ξ^{(l)})}]] .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic and Road Safety · Traffic Prediction and Management Techniques · Vehicle emissions and performance

Full text

Quantifying the causal effect of speed cameras on road traffic collisions via an approximate Bayesian doubly robust estimator

Daniel J. Graham

Corresponding author: Department of Civil Engineering, Imperial College London, London, SW7 2AZ, UK. Email: [email protected]

Cian Naik

Department of Mathematics, Imperial College London, London, UK

Emma J. McCoy

Department of Mathematics, Imperial College London, London, UK

Haojie Li

School of Transportation, Southeast University, Nanjing, China

Abstract

This paper quantifies the effect of speed cameras on road traffic collisions using an approximate Bayesian doubly-robust (DR) causal inference estimation method. Previous empirical work on this topic, which shows a diverse range of estimated effects, is based largely on outcome regression (OR) models using the Empirical Bayes approach or on simple before and after comparisons. Issues of causality and confounding have received little formal attention. A causal DR approach combines propensity score (PS) and OR models to give an average treatment effect (ATE) estimator that is consistent and asymptotically normal under correct specification of either of the two component models. We develop this approach within a novel approximate Bayesian framework to derive posterior predictive distributions for the ATE of speed cameras on road traffic collisions. Our results for England indicate significant reductions in the number of collisions at speed cameras sites (mean ATE = -15%). Our proposed method offers a promising approach for evaluation of transport safety interventions.

Keywords: Doubly robust; Bayesian inference; propensity score; average treatment effect; speed cameras; casualties.

1 Introduction

Fixed speed limit enforcement cameras are a common intervention used to encourage drivers to comply with maximum legal speed limits. The cameras are installed at sites on selected links in order to detect speed limit violations, which can subsequently be punished with monetary fines, driver licence disqualification points, or prosecution. Since the introduction of speed cameras (SCs) there has been considerable debate about their effects on road traffic collisions (RTCs). At various times claims have been made that SCs serve to reduce RTCs, that they have no effect, or even that they increase RTCs by encouraging more erratic driving behaviour.

A number of academic studies of the effect of speed cameras on RTCs have been undertaken (for a review see Li et al. 2013). Most studies find that speed cameras have led to a reduction in RTCs, but the range of estimated effects is large (from 0% to -55%). Variation in estimates is to be expected given that study results pertain to diverse empirical contexts, but it is also the case that a number of different methods have been applied which can have a critical influence on results obtained. In particular, since SCs are not randomly assigned, it is essential that any adopted method recognises that the observed relationship between SCs on RTCs may be subject to confounding. Confounding arises when the characteristics that influence treatment assignment (i.e. whether a site is ‘treated’ and ‘untreated’ with an SC) also matter for outcomes (i.e. RTCs). Regression to the mean (RTM), for instance, is a well known manifestation of confounding that arises via ‘selection bias’.

The extent to which confounding has been recognised and addressed in existing studies varies considerably. Some studies have simply ignored it, using simple before-and-after methods with control groups (e.g. Christie et al. 2003, Cunningham et al. 2000, De Pauw et al. 2014, Gains et al. 2004, 2005, Goldenbeld and van Schagen 2005, Jones et al. 2008, Maher 2015). Others have used the empirical Bayes (EB) method as suggested by Hauer et al. (2002), largely to adjust for effects of confounding that arise via RTM (e.g. Chen et al. 2002, Elvik 1997, Hoye 2015, Mountain et al. 2004, 2005, Shin et al. 2009). Finally, there are a small number of studies that have used time-series methods, either interrupted time-series analyses with control groups or ARIMA, to test for changes in outcome rates (Carnis and Blais 2013, Hess and Polak 2003, Keall et al. 2001, e.g.). Where studies have attempted to address confounding this has been done via the inclusion of covariates in outcome regression (OR) models, typically using Poisson or negative binomial Generalised Linear Models (GLMs).

In a previous paper we adopted a propensity score (PS) matching approach to evaluate the effectiveness of speed cameras (see Li et al. 2013). A key advantage of the PS over OR approach is that it provides an effective way of isolating a valid control group by ensuring that the distribution of pre-treatment covariates matches those of the treated group and that genuine overlap in the support of the covariates exists between the two groups. However, as with the OR approach, valid inference from PS models crucially depends on the unknown PS model being correctly specified.

In this paper we build on our previous work by developing and applying an estimation approach which we believe has much to offer in evaluating the effectiveness of road safety interventions. Our approach uses the principle of doubly-robust (DR) estimation, which provides robustness to model misspecification by combining both OR and PS models to derive an average treatment effect (ATE) estimator which is consistent and asymptotically normal under correct specification of just one of the two component models. The DR approach is attractive for our application because the PS and OR models we can construct make different assumptions about the nature of confounding. For the PS model, we are able to faithfully represent via measured covariates the formal criteria that exist for the assignment of speeds cameras to sites. For the OR model, we can difference our response variable before and after treatment to allow for the existence of site level time-invariant unobserved effects in addition to measured confounders.

To avoid common sources of misspecification error, we estimate our component models using semiparametric Generalized Additive Mixed Models (GAMMs) which make minimal a priori assumptions on the functional form of the relationships under study. We also use a matching algorithm prior to forming the DR model to establish a valid control group. Thus, in our approach, potential biases from confounding are addressed by combining three compatible modelling tools: via matching to achieve comparability between treated and control sites, via a regression model for RTCs, and via a model for the treatment assignment mechanism.

DR estimators have been studied and applied extensively in the frequentist setting (e.g. Robins 2000, Robins et al. 2000, Robins and Rotnitzky 2001, van der Laan and Robins 2003, Lunceford and Davidian 2004, Bang and Robins 2005, Kang and Schafer 2007). A further contribution of the paper is that we develop our binary DR estimator within the Bayesian paradigm. A Bayesian representation of the DR model has proven difficult to formulate in previous work because DR estimators are typically constructed as solutions to estimating equations based on a set of moment restrictions that do not imply fully specified likelihood functions. We choose the Bayesian paradigm for three main reasons. First, DR estimation of the ATE involves prediction and extrapolation over covariate distributions with underlying uncertainty in parameter estimates. Bayesian inference provides a suitable framework for prediction that explicitly addresses such uncertainty in the sense that both the predicted observations, and the relevant parameters for prediction, have the same random status. Second, by deriving a posterior predictive distribution for the ATE, rather than a fixed value, we can make probability statements about the causal quantity of interest allowing us to discuss findings in relation to specific hypotheses or in terms of credible intervals which can offer a more intuitive understanding of the effects of SCs for public policy formulation. Finally, we develop an approximate Bayesian approach that can utilise prior information about the parameters of interest, which could be useful in evaluating safety interventions when historical data or training data from other regions are available.

The paper is structured as follows. Section two outlines broad trends in road traffic casualties for Britain and then sets out a formal causal modelling framework to estimate the effects of SCs on RTCs. Section three describes our approximate Bayesian DR approach and presents some simulations that demonstrate its properties. Section four describes the data available for estimation and outlines our chosen model specifications. Results are then presented in section five and conclusions are drawn in the final section.

2 A causal inference framework to quantify the effects of speed cameras

2.1 Road traffic casualties in Britain

For the year ending September 2016 the UK DfT recorded a total of 182,560 causalities on British roads of which 25,160 were classified as killed or seriously injured (KSIs) (DfT 2017). Since 2010 the annual numbers of fatalities and KSIs have not changed significantly, following several years in which road safety was improving. The average number of fatal road traffic incidents over the period 2010 to 2016 is approximately 1,800. Since the volume of road traffic has continued to grow over this period, however, the number of fatalities per vehicle mile driven has been falling (DfT 2016).

The DfT argue that there is good evidence to suggest that while the absolute number of fatalities on British roads now appears to be relatively static, overall absolute casualty numbers are continuing to fall. In short, levels of safety appear to be improving in relative terms and not deteriorating in absolute terms. Given the changes that have occurred in vehicle technology, medical care, and road safety interventions, however, the DfT also note that a comprehensive causal understanding of the factors underpinning casualty trends is currently out of reach. In this paper we attempt to contribute to such an understanding by quantifying the causal impact of one type of safety intervention: speed cameras (SCs).

2.2 ATE estimation within the potential outcomes framework

Our sample comprises $n$ , $i=1,...,n$ , links on the road network. Some links have a SC other do not. We define $D_{i}\in\{1,0\}$ as a binary random variable indicating the presence or otherwise of a SC and we refer to this as the treatment variable. We are interested in the effect of the treatment on an outcome $Y_{i}$ , which measures collision frequency. We define $Y_{i}(1)$ and $Y_{i}(0)$ as the potential outcomes for unit $i$ under treated and control status respectively. Recognising that SCs are not assigned randomly, we also define $X_{i}$ as a random vector of pre-treatment covariates that capture characteristics of links that are relevant to whether a SC was assigned or not, and are also relevant for outcomes. Thus, the data we observe for each link takes the form of a random vector, $z_{i}=(y_{i},d_{i},x_{i})$ , where $y_{i}$ denotes a response, $d_{i}$ the treatment received, and $x_{i}$ a vector of pre-treatment covariates.

Ideally, we would assess the effects of SCs on each link by calculating the individual causal effect (ICE): $\tau_{i}=Y_{i}(1)-Y_{i}(0)$ , but the observed data reveal only actual outcomes not potential outcomes. Thus we observe the random variable $Y_{i}=Y_{i}(1)I_{1}(D_{i})+Y_{i}(0)(1-I_{1}(D_{i}))$ , where $I_{1}(D_{i})$ is the indicator function for receiving the treatment, but we do not observe the joint density, $f(Y_{i}(0),Y_{i}(1))$ , since a SC cannot be both present and absent on a link simultaneously. Instead, our target of inference is the ATE, defined as

[TABLE]

which measures the difference in expected outcomes under treatment and control status.

A key insight of the potential outcomes approach is that if we focus on estimating the ATE then we do not have to observe all potential outcomes, even under a non-random treatment assignment, as long as three key assumptions hold. First, the potential outcomes for unit $i$ must be conditionally independent of the treatment assignment given a (sufficient) set of observed covariates $X_{i}$ : $Y_{i}(0),Y_{i}(1))\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}I_{1}(D_{i})|X_{i}$ . Second, the support of the conditional distribution of $X_{i}$ given a particular treatment status must overlap with that of $X_{i}$ given any other treatment status: $0<\text{Pr}(I_{1}(D_{i})=1|X_{i}=x)<1,\ \forall\ x$ . Third, the relationship between observed and potential outcomes must comply with the Stable Unit Treatment Value Assumption (SUTVA) (e.g. Rubin 1978, 1980, 1986, 1990), which requires that the observed response under a given treatment allocation is equivalent to the potential response under that treatment allocation: $Y_{i}=I_{1}(D_{i})Y_{i}(1)+(1-I_{1}(D_{i}))Y_{i}(0)$ for all $i=1,...,n$ .

The three assumptions defined above, which are together referred to by Rosenbaum and Rubin (1983) as strong ignorability, allow for identification of causal effects from observational data because if they hold the ATE can be derived as,

[TABLE]

Conditional independence justifies the equality of (1a) and (1b), the SUTVA allows the substitution of observed for potential outcomes to give (1c), and overlap ensures that the population ATE in (1c) is estimable since there are units in both the treated and untreated groups.

Thus, if strong ignorability holds, the potential outcomes approach offers a route to obtaining valid causal estimates of the ATE of SCs. To proceed we need to estimate the relevant expectations in (1c) above.

2.3 Causal estimators

Using the notation of Tsiatis and Davidian (2007), we define joint densities of the observed data of the form

[TABLE]

Given strong ignorability, estimation of the ATE of SCs can proceed in one of the following ways;

i.

Outcome regression (OR) model - leave $f_{D|X}(d|x)$ and $f_{X}(x)$ unspecified and posit a model for $\mathbb{{E}}[Y_{i}|D_{i},X_{i}])$ ; the mean of the conditional density of the response given the covariates, using an OR model $\Psi^{-1}\{m(D_{i},X_{i};\beta)\}$ , for known link function $\Psi$ , regression function $m()$ , and unknown parameter vector $\beta$ . If the OR is correctly specified for the mean response then the ATE can be consistently estimated by.

[TABLE] 2. ii.

Propensity score (PS) model - leave $f_{Y|D,X}(y|d,x)$ and $f_{X}(x)$ unspecified but assume a model for $f_{D|X}(d|x)$ ; the conditional density of treatment assignment given covariates. This is a propensity score (PS) model, denoted $\pi(D_{i}|X_{i};\alpha)$ , which can be used to form a number of different nonparametric estimators but of primary interest here is its use in the weighting estimator attributed to Horvitz and Thompson (1952)

[TABLE]

which is consistent under correct specification of the PS by virtue of the fact that $\mathbb{{E}}[Y_{i}(1)]=\mathbb{{E}}\{\left[Y_{i}(1)\cdot I_{1}(D_{i})\right]/\pi(D_{i}|X_{i};\alpha)\}$ and similar for control treatment status. 3. iii.

Doubly-robust (DR) model - leave $f(x)$ unspecified but assume both an OR model and a PS model and combine them to form a DR estimator. This is achieved by weighting or augmenting the OR model with a function of the inverse of the estimated PS to give a DR model. In this paper we estimate the weighted model

[TABLE]

where the unknown parameter vector $\xi$ is obtained by weighting the model with

[TABLE]

This model will consistently estimate $\mathbb{{E}}[Y_{i}|D_{i},X_{i}])$ if the model $\Psi^{-1}\{m(I_{1}(D_{i}),X_{i};\beta)\}$ is correct because while weighting may induce inefficiency it will leave the consistency and asymptotic normality of the OR estimates unchanged. If the OR model is incorrectly specified, but the PS is correctly specified, the model is still consistent because weighting gives rise to estimating equations of the form

[TABLE]

where $\phi_{i}\equiv\phi(D_{i},X_{i})$ is a working conditional variance for $Y_{i}$ given $(D_{i},X_{i})$ , which effectively correct for the bias in approximating $\mathbb{{E}}[Y_{i}|D_{i},X_{i}]]$ using $\Psi^{-1}\{m(D_{i},X_{i};\beta)\}$ (for a proof see Lunceford and Davidian 2004).

We use estimates of $\xi$ to form the DR estimator

[TABLE]

3 Approximate Bayesian doubly-robust estimation

So far we have discussed DR estimation within the context of frequentist semiparametric inference. As mentioned in the introduction to the paper there are good reasons why a Bayesian inferential approach is particularly beneficial for estimation of road safety interventions. Bayesian inference has, however, proven difficult to apply for DR estimators because they are based on a set of moment restrictions which do not provide fully specified likelihood functions. Here, we make some improvements to the approach proposed by Graham et al. (2016) in the context of continuous treatment. In contrast to that paper we focus on binary treatments using PS weighting rather than augmentation to achieve the DR model and we implement ways of incorporating prior information into the posterior distribution of the ATE. The basic theory underpinning approximate Bayesian inference in this context is covered comprehensively in Graham et al. (2016) and so we provide only a brief summary here.

The Bayesian bootstrap was first introduced by Rubin (1981) and applied in weighted likelihood models by Newton and Raftery (1994). The basic idea is to create new datasets by repeatedly re-weighting the original data in order to obtain the posterior distribution for some parameter of interest. If we treat our observed data, $z_{i}$ say, as effectively coming from a multinomial distribution with distinct values $a_{k}$ , $k=(1,...,K)$ , and attach a probability to each distinct value $\theta=(\theta_{1},...,\theta_{k})$ , then by placing an improper Dirichlet prior on $\theta$

[TABLE]

the posterior density also has a Dirichlet distribution

[TABLE]

with parameter $n_{k}$ . This posterior can be stimulated via the weighted likelihood

[TABLE]

in which the weights $w=(w_{1},...,w_{n})$ are distributed according to the uniform Dirichlet distribution and simulated as $n$ independent standard exponential (i.e. gamma(1,1)) variates and standardised. The weighted likelihood reduces to

[TABLE]

say, where $n\gamma_{k}$ is the sum of the weights $w_{i}$ for which $z_{i}=a_{k}$ . Since the vector $\gamma=(\gamma_{1},...,\gamma_{K})$ has a Dirichlet distribution with parameters $n_{k}=(n_{1},...,n_{K})$ ,

[TABLE]

and since at the point of maximisation of $\widetilde{L}(\theta)$ is $\widetilde{\theta}=\gamma$ , then the solutions to the maximised weighted likelihood function with repeatedly sampled uniform Dirichlet weights $w^{(l)}$ represent a sample from the posterior of $\theta$ under the improper prior $\prod_{k}\theta^{-1}_{k}$ .

To apply the Bayesian bootstrap to our DR model we estimate

[TABLE]

with weights

[TABLE]

The maximiser of $\widetilde{L}(\xi)$ , which we denote $\widetilde{\xi}$ , implies a solution to

[TABLE]

which as noted above has the DR property. We repeatedly draw sets of random weights $\{w^{(l)}_{i}\}^{n}_{i=1}$ as $n$ standardised independent standard exponential variates and solve (3) to build up an empirical posterior density of $\widetilde{\xi}$ , denoted $p_{n}(\widetilde{\xi})$ , from which the sampled values $\widetilde{\xi}^{(l)}$ are consistent with the DR estimating equations.

Newton and Raftery (1994) apply sampling-importance resampling (SIR) to improve accuracy of the weighted bootstrap approach, but this improvement requires a fully specified likelihood function. Instead, for our restricted moment model, we use the resampling scheme proposed by Muliere and Secchi (1996) which extends Rubin’s bootstrap in a general Bayesian nonparametric context. Two attractive features of Muliere and Secchi’s approach for causal modelling are that it ensures that predictive distributions are not constrained to be concentrated on observed values and it allows us to take prior opinions into account. The posterior predictive distribution of the ATE, incorporating prior information, is obtained in the following way.

i.

Estimate the PS model $\pi(D_{i}|X_{i};\alpha)$ , and form

[TABLE] 2. ii.

Draw a single set of random weights $\{w^{(l)}_{i}\}^{n}_{i=1}$ and form the combined weights $w_{i}^{(l)}\cdot\widehat{\kappa}_{i}\left(d_{i}|x_{i};\widehat{\alpha}\right)$ and estimate the weighted model

[TABLE] 3. iii.

Repeatedly compute (ii) using new weights $\{w^{(l)}_{i}\}^{n}_{i=1}$ to obtain the empirical posterior distribution $p_{n}(\widetilde{\xi})$ . 4. iv.

Introduce a prior distribution $p_{0}$ for $\xi$ and a positive number $k$ , the ‘measure of faith’ that we have in this prior. This can range anywhere from 1 to a size comparable to the number of samples of $\xi$ . 5. v.

Generate $m$ observations $x_{1}^{*},...,x_{m}^{*}$ from $\frac{kp_{0}+Lp_{n}}{k+L}$ , where $p_{n}$ is as above. We choose $m=L$ in our case. 6. vi.

For $i=1,...,m$ generate $v_{i}$ from a $\Gamma\bigg{(}\frac{L+k}{m},1\bigg{)}$ distribution. 7. vii.

Sample new parameters $\widetilde{\xi}_{MS}$ from $x_{1}^{*},...,x_{m}^{*}$ using the weights $v_{1},...,v_{m}$ to form the posterior $p_{m}(\widetilde{\xi})$ . 8. viii.

Resample $V$ values of the covariate vector uniformly over the observed values and a single vector $\xi^{(m)}$ from $p_{m}(\widetilde{\xi})$ . 9. ix.

Form a sampled value of the ATE random variable as

[TABLE] 10. x.

Repeat this procedure $M$ times, $m=(1,...,M)$ , to obtain the posterior predictive distribution.

3.1 Simulations

In this subsection we present some simulation to demonstrate the DR properties of our approximate Bayesian approach. The simulations are based on the following data generating process: a binary treatment $D$ is assigned as a function of covariate $X$ , and the outcome of interest $Y$ depends on both treatment $D$ and covariate $X$

[TABLE]

where $\alpha_{0}=2$ , $\alpha_{1}=0.2$ , $\beta_{0}=10$ , $\beta_{1}=5$ , $\beta_{2}=0.2$ . The true ATE is given by parameter $\beta_{1}$ , that is $\tau=5.0$ .

The following models are tested:

$\widehat{\tau}_{BOR1}$ - an approximate Bayesian OR model based on the correctly specified model: $\mathbb{{E}}[Y|D,X]=\beta_{0}+\beta_{1}D+\beta_{2}X$ . The point estimate reported in the simulations is the mean value of the ATE posterior predictive distribution, i.e.

[TABLE]

2.

$\widehat{\tau}_{BOR2}$ - same as [1.] except based based on an incorrectly specified OR model with covariate $X$ excluded.

3.

$\widehat{\tau}_{PS1}$ - an approximate Bayesian inverse PS weighted model based on the correctly specified PS model

[TABLE]

4.

$\widehat{\tau}_{PS2}$ - an approximate Bayesian inverse PS weighted model based on an incorrectly specified PS model, in which the PS is generated randomly from the continuous uniform distribution: $\widehat{\pi}(D|X)\sim\text{Uniform}(0,1)$ .

5.

$\widehat{\tau}_{BDR1}$ - an approximate Bayesian DR model based on an incorrectly specified OR model ( $X$ excluded) but with weights based on the correct PS model

[TABLE]

6.

$\widehat{\tau}_{BDR2}$ - an approximate Bayesian DR model based on a correctly specified OR model but with weights based on the incorrect PS model.

7.

$\widehat{\tau}_{BDR3}$ - an approximate Bayesian DR model based on the incorrectly specified OR model weighted with weights based on the incorrect PS model.

The simulations are based on 1000 runs on generated datasets of size 1,000. In each case, we place a Normal prior on the treatment coefficient $\beta_{1}$ , with mean equal to the true value (5 in this case). We set the measure of faith $k$ to be relatively low so as not to overly affect the results. Table 1 shows our simulation results. Mean values and variances of the point estimates obtained (i.e. means and variances of the ATE distributions) and the mean squared error (MSE) are reported.

The mean of the posterior distribution for the ATE from the correctly specified OR model, $\widehat{\tau}_{BOR1}$ , provides a good approximation to the true value of $\tau$ . The incorrectly specified OR model, BOR2, fails to address confounding and consequently $\widehat{\tau}_{BOR2}$ provides a poor approximation to the true ATE. A good estimate of $\tau$ is achieved via the correctly specified PS model ( $\widehat{\tau}_{PS1}$ ), but when the PS is model is mispecified ( $\widehat{\tau}_{PS1}$ ) the estimate of the ATE is far away from the true value. In our simulations the PS model is severely misspecified, or simply wrong, having being generated randomly. This tendency of the inverse PS model to fail quite considerably under severe misspecification is well known in the literature Kang and Schafer (2007). Weighting the incorrectly specified OR model with weights $\widehat{\kappa}(D,X)$ , based on a correctly specified PS model, as in the BDR1 model, provides correction for misspecification bias with an average point estimate very close to the true value, but slightly larger posterior variances relative to the correctly specified OR model. The BDR2 model simulation also produces valid point estimates because weighting by weights based on an incorrectly specified PS model does not does not induce bias, but it does increase variance. Finally, if both the OR and PS models are wrongly specified as in BDR3, the model fails to produce a good point estimate of the mean ATE.

4 Data and model specifications

4.1 Treatment and outcome variable

We have data on the location of fixed speed cameras for 771 camera sites in the following English administrative districts: Cheshire, Dorset, Greater Manchester, Lancashire, Leicester, Merseyside, Sussex and the West Midlands. These sites form our group of treated units. To select potential control sites we randomly sampled a total of 4787 points on the network within our eight administrative districts. The large ratio of potential control to treated units is adopted to ensure that we have a sufficient number of control units after we apply a matching algorithm.

Our outcome variable is the number of personal injury collisions (PICs) per kilometre as recorded from the location of the speed cameras, or in the case of control groups, from the randomly selected point. The PIC data are taken from records completed by police officers each time that an incident is reported to them. The individual police records are collated and processed by the UK Department for Transport as the ‘STATS 19’ data. The location of each PIC is recorded using the British National Grid coordinate system and can be located on a map using Geographical Information System (GIS) software. Because the established dates of speed cameras vary from 2002 to 2004, the period of analysis is from 1999 to 2007 to ensure the availability of collision data for the years before and after the camera installation for every camera site.

4.2 Covariates

To adequately adjust for confounding we require a set of measured covariates that adequately represent the characteristics of units that simultaneously determine treatment assignment and outcome. For the UK there exists a formal set of site selection guidelines for fixed speed cameras (see Gains et al. 2004) that are extremely valuable in choosing covariates. The criteria are as follows

Site length: between 400-1500 m. 2. 2.

Number of fatal and serious collisions (FSCs): at least 4 FSCs per km in last three calendar years. 3. 3.

Number of personal injury collisions (PICs): at least 8 PICs per km in last three calendar years. 4. 4.

85th percentile speed at collision hot spots: 85th percentile speed at least 10% above speed limit. 5. 5.

Percentage over the speed limit: at least 20% of drivers are exceeding the speed limit.

Criteria one to three are primary guidelines for site section and criteria four and five are of secondary importance. There are sites that do not meet the above the above criteria that will still be selected as enforcement sites, mainly for reasons such as community concern and engineering factors.

Selection of the speed camera sites was primarily based on collision history. collision data can be obtained from the STATS 19 database and located on the map using GIS. However, secondary criteria such as the 85th percentile speed and percentages of vehicles over the speed limit are normally unavailable for all sites on UK roads. If speed distributions differ between the treated and untreated groups, then the failure to include the speed data could bias the estimation, an issue discussed in previous research (e.g. Mountain et al. 2005, Gains et al. 2004). For untreated sites with the speed limit of 30 mph and 40 mph, the national average mean speed and percentages of speeding are similar to the data for the camera sites. The focus groups for this study are sites with the speed limit of 30 mph and 40 mph throughout the UK. It is reasonable to assume that there is no significant difference in the speed distribution between the treated and untreated groups and hence exclusion of the speed data will not affect the accuracy of the propensity score model.

It is also possible that drivers may choose alternative routes to avoid speed cameras sites. collision reduction at camera sites may include the effect induced by a reduced traffic flow. The benefits of speed cameras will therefore be overestimated without controlling for the change in traffic flow. The annual average daily flow (AADF) is available for both treated and untreated roads and the effect due to traffic flow is controlled for in this study by including the AADF in the propensity score model.

In addition to the criteria that strongly influence the treatment assignment, factors that affect the outcomes should also be taken into account when the propensity score model is specified. We further include road characteristics such as: road types, speed limit, and the number of minor junctions within site length, which are suggested as important factors when estimating the safety impact of speed cameras (Gains et al. 2005, Christie et al. 2003).

4.3 Component model specifications

The outcome variable of interest is the number of collisions per site. For the OR model the response is specified in differenced form, i.e. the number of collisions in the post-treatment period minus the number of collisions in the pre-treatment period. Differencing allows for the existence of unit level time-invariant effects, which could be random or fixed. The PS model is estimated using a logit Generalized Additive Mixed Model (GAMM) specification. Matching and overlap is achieved using nearest neighbour matching via the MatchIt package in R. The weighted OR model is then estimated on the trimmed dataset, which satisfies matching and overlap conditions, using a Gaussian GAMM specification. We use GAMMs to avoid making a-priori assumptions on the functional form of the relationships under study.

As mentioned in the introduction, the DR approach is particularly attractive for our application because of the differences inherent in our PS and OR model specifications. Due to the existence of formal criteria for SC assignment we have a high degree of confidence in the ability of our covariates to eliminate confounding via the PS model. For the OR model, differencing of the response variable before and after treatment allows for the existence of site level time-invariant unobserved effects in addition to measured confounders. Thus, there are subtle differences in the way we model the ATE via the PS or OR approaches. A degree of robustness is offered using a DR approach since we will obtain a consistent estimate of the ATE if just one of the component models is well specified.

5 Results

The objective of our application is to estimate the marginal effect of SCs on RTCs, having adjusted for baseline confounders. We estimate the following models: an OR model, an IPW model, a DR model comprising an OR model weighted with the inverse PS covariate (DR), and a naïve model which is simply the OR model without covariates. For the naïve model we report results using the matched and full samples. All models are repeatedly estimated using the approximate Bayesian approach outlined above. In addition to the posterior predictive distribution for the ATE we report point estimates at the mean of the posterior. For comparison, we also report Frequentist results.

The results are shown in table 2 below including means and credible intervals of the ATE distributions. Our causal models (OR, IPW and DR) indicate that the presence of speed cameras corresponds with an average change in the number of RTCs of -14.4% to -15.5% . Note that the approximate Bayesian and Frequentist point estimates are very similar, which is what we would expect for linear models with uninformative priors. In comparison, the Naïve model which does not adjust for confounding, finds a higher ATE of -17.6% using the matched sample and -33.6% using the unmatched sample. Figure LABEL:fig1 below shows the posterior predictive distribution derived from the DR model.

Thus, it would appear that correcting for potential sources of confounding serves to reduce the magnitude of our ATE estimates, but we still find a substantial reduction in RTCs associated with presence of speed cameras. The difference in estimated ATE between the naïve and causal models makes sense given that the formal criteria used to assign SCs favours sites that have exhibited high rates of collisions in the past. Crucially, our causal models imply that SCs do make a real difference to RTCs over and above the modelled effect of confounding from non random assignment.

6 Conclusions

In this paper we have the quantified the causal effect of speed cameras on road traffic collisions via an approximate Bayesian doubly robust approach. This is the first time such an approach has been applied to study road safety outcomes. The method we propose could be used more generally for estimation of crash modification factor (CMF) distributions. Simulations demonstrate that the approach is doubly-robust for average treatment effect estimation. Our results indicate that speed cameras do cause a significant reduction in road traffic collisions, by as much as 15% on average for treated sites. This is an important result that could help inform public policy debates on appropriate measures to reduce RTCs. The adoption of evidence based approaches by public authorities, based on clear principles of causal inference, could vastly improve their ability to evaluate different courses of action and better understand the consequences of intervention.

There are thus two important implications of our study that could ultimately improve highway safety. First, is that such inference could be employed to achieve a more effective assignment of SCs and consequent reduction of RTCs. Second, the approach outlined above could be used to continually monitor SC effectiveness as baseline conditions (e.g. related to road traffic and wider demographic and social characteristics) change, thus providing a mean of monitoring the effectiveness of road safety interventions dynamically.

Bibliography39

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bang and Robins (2005) Bang, H. and J. M. Robins (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61 , 962–972.
2Carnis and Blais (2013) Carnis, L. and E. Blais (2013). An assessment of the safety effects of the french speed camera program. Accident Analysis & Prevention 51 , 301–309.
3Chen et al. (2002) Chen, G., W. Meckle, and J. Wilson (2002). Speed and safety effect of photo radar enforcement on a highway corridor in british columbia. Accident Analysis & Prevention 34 , 129–138.
4Christie et al. (2003) Christie, S., R. Lyons, F. Dunstan, and S. Jones (2003). Are mobile speed cameras effective? a controlled before and after study. Injury Prevention 9 , 302–306.
5Cunningham et al. (2000) Cunningham, C., J. Hummer, and J. Moon (2000). Analysis of automated speed enforcement cameras in charlotte, north carolina. Transportation Research Record 2078 , 127–134.
6De Pauw et al. (2014) De Pauw, E., S. Daniels, T. Brijs, E. Hermans, and G. Wets (2014). An evaluation of the traffic safety effect of fixed speed cameras. Safety Science 62 , 168–174.
7Df T (2016) Df T (2016). Transport statistics great britain: 2016. Statistical release, UK Department for Transport, London.
8Df T (2017) Df T (2017). Reported road casualties in great britain: quarterly provisional estimates. Statistical release, UK Department for Transport, London.