A Shared Component Point Process Model for Urban Policing
Claire Kelling, Murali Haran

TL;DR
This paper introduces a shared component point process model that directly relates police use of force to other police events at the exact location level, offering a flexible alternative to traditional spatial aggregation methods.
Contribution
It develops a novel shared component modeling approach for two point processes, maintaining point-level detail and improving relationship characterization.
Findings
Shared component approach effectively models the relationship between police events.
Method outperforms traditional spatial aggregation techniques in simulations.
Application to Chicago data demonstrates practical utility.
Abstract
Newly available point-level datasets allow us to relate police use of force to other events describing police behavior. Current methods for relating two point processes typically rely on the spatial aggregation of one of the two point processes. We investigate new methods that build upon shared component models and case-control methods to retain the point-level nature of both point processes while characterizing the relationship between them. We find that the shared component approach is particularly useful in flexibly relating two point processes, and we illustrate this flexibility in simulated examples and an application to Chicago policing data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPoint processes and geometric inequalities · Facility Location and Emergency Management
A Shared Component Point Process Model for Urban
Policing
Claire Kelling
Carleton College, Northfield, MN, United States.
Center for Math & Computing 225
Northfield, MN 55057](mailto:[email protected]%20)
Murali Haran
Pennsylvania State University, State College, PA, United States.
Abstract
Newly available point-level datasets allow us to relate police use of force to other events describing police behavior. Current methods for relating two point processes typically rely on the spatial aggregation of one of the two point processes. We investigate new methods that build upon shared component models and case-control methods to retain the point-level nature of both point processes while characterizing the relationship between them. We find that the shared component approach is particularly useful in flexibly relating two point processes, and we illustrate this flexibility in simulated examples and an application to Chicago policing data.
Keywords: case-control, criminology, Gaussian process, marked point process, policing, shared component model
1 Introduction
Event-level data with precise location information on phenomena such as police use of force incidents, police stops, and violent crime are increasingly available in many urban areas. Determining the causes and consequences of police use of force presents a growing challenge to criminologists and public policymakers as excessive use of force by police persists over time. We consider rich datasets on policing from Chicago. Existing work often incorporates high-level spatial information to study police behavior, such as indicator variables for the district where events occurs, rather than information about the exact locations (cf. Antonovics and Knight, 2009). Point process methods allow us to incorporate detailed information about the precise location of events. Additionally, integrating related phenomena can also be helpful in studying police use of force. For example, the frequency and spatial distribution of both violent crimes and police stops may help characterize the spatial distribution of and factors influencing police use of force incidents. Police stops can give some preliminary information on where police are patrolling and the prevalence of violent crime is part of the characterization of the communities where police are patrolling.
Many methods exist for analyzing the relationship between different point processes, ranging from descriptive statistics to parametric and nonparametric methods. For instance, case-control models are often used to analyze the relationship between two types of events without aggregating either type of point. Specifically, the intensity of one point process, the “control” process, is used to scale the intensity of a second point process, the “case” process. In this paper we develop a new shared component model for point processes that provides a rich framework for analyzing the spatial relationship between two point patterns and avoids specifying case and control processes. We compare our shared component approach to existing models for relating two point processes, particularly case-control models.
We summarize the main contributions of this paper below.
- •
We propose a shared component model for two point processes that allows for a spatial pattern that is unique to each point process as well as a pattern that is shared between the two point processes. This model builds upon the shared component model developed for areal data (Knorr-Held and Best, 2001). We find through simulated examples and application to Chicago policing data that our model is flexible and easy to interpret.
- •
We study the use of a case-control model for point processes through simulation studies and applications to Chicago policing data. We consider two methods of estimating the intensity functions and regression coefficients in this context: logistic regression (cf. Diggle et al., 2007) and Bayesian estimation of a spatial intensity function. We find that care must be taken in choosing the estimation procedure and corresponding interpretation for this class of models.
- •
We compare our shared component model to the case-control model and we find that the shared component model is computationally more complex but allows for additional flexibility and spatial structure when modeling the relationship between two point processes.
- •
Although our results are preliminary, we find that the shared component methodology allows us to effectively study the relationship between police use of force and police stops in Chicago. We illustrate a spatial pattern that is common to both point processes, south of downtown Chicago, and unique factors that impact the processes individually. Use of force events have a higher spatial intensity in the northern side of Chicago and police stops have a higher intensity in the southern side of the city, after accounting for a shared spatial pattern.
The paper is organized as follows. We describe areas of active research relating police use of force to other event datasets, such as police stops and violent crime, and the kinds of spatial analysis used in Section 2. In Section 3, we describe statistical methods for relating two point processes. We give background information on case-control methods for point processes and describe our new shared component model for point processes. In Section 4 we provide details about our Chicago data on police stops and police use of force used in this paper and also describe our simulated examples. We apply the case-control and shared component models to Chicago policing data and simulated examples in Section 5 and conclude with a discussion in Section 6.
2 Background
We begin by summarizing the kinds of policing data motivating this study along with related research questions in criminology. We also describe existing methodology that has been developed to analyze these datasets. Examples of research questions include the following: What are possible factors that influence police use of force and how can additional rich point-level datasets help us better understand the spatial distribution of police use of force and police behavior more broadly? In the first subsection, we describe research relating police use of force to police stop data. In the second subsection, we similarly describe studies relating police use of force to another point-level dataset: violent crime. Finally, we illustrate current approaches to spatial analysis of these datasets. Most of these studies rely on spatial aggregation, though there has been some recent work on methods that reduce the amount of aggregation.
2.1 Police Use of Force and Police Stops
Analysis of the spatial distribution of police stops can help indicate the baseline expectation for police use of force incidents. Although police stops do not represent all encounters with police, we can use police stops to give additional information about where use of force incidents are occurring at different rates from all police stops. There may be more use of force incidents in a given neighborhood not because of any characteristics of the neighborhood but rather because more police are patrolling the area. Ba et al. (2021a) show that when aggregating to large units over space and time, “observed behavioral differences may simply reflect differing patrol environments, rather than differences in policing approaches.” Proxies for police presence, such as the stops made by police, can be a useful tool in determining where police are patrolling, and can therefore help elucidate police use of force that is beyond where expected by an increased police presence. Weisburst (2019) study use of force incidents as extensions of arrests and compare demographic patterns in both arrests and subsequent uses of force in the same incident. Police stops allow us to create a closer proxy to police presence than arrests, as not all stops involve arrests.
We note that police stops do not give us complete information on all individuals that police observe, as not all police encounters involve stops. The decision of officers to make a stop, out of all individuals that they encounter during a given patrol, may be biased. Studies have shown that it is important to consider bias in police stops when analyzing bias in police use of force (cf. Knox et al., 2020a). Failure to account for uneventful shifts may lead to inaccurate inferences on biases in policing (Knox, 2021). However, for the purposes of this study, we are analyzing only the spatial distribution between police stops and police use of force, not bias in outcomes of force. This framework could be expanded in the future to incorporate information about officers and civilians into the spatial model, for example through the two-stage framework developed by Kelling and Haran (2022).
2.2 Police Use of Force and Violent Crime
Data on violent crime is a second useful tool in determining the potential causes and consequences of police use of force. The amount of violent crime in a given area creates a neighborhood context that may have an important impact on police use of force, as officers may bring prior knowledge about the neighborhood. Through a vignette approach and survey responses, Phillips and Sobol (2011) finds that in an area with more violent crimes and higher crime rates as a percentage of the population, officers are more likely to perceive the use of unnecessary force as acceptable than in other areas, while controlling for other factors. In the criminology literature, there are hypotheses that places that have more violent crime and are disadvantaged are more likely to have more incidents of police use of force (Lawton, 2007; Terrill and Reisig, 2003). Specifically, there are many existing studies that find a weak positive relationship between violent crime and police use of force using the spatial distribution of violent crime (Lawton, 2007; Lee et al., 2010; Terrill and Reisig, 2003; Lee et al., 2014). Yet, there is a conflicting hypothesis that prevalence of violent crime does not lead to an increase in police use of force (Slovak, 1988). Other recent work shows that it may be “impossible” to determine whether violent crime in neighborhoods is a result or cause of policing in those neighborhoods (Simckes et al., 2021). We develop methods that allow us to analyze the relationship between different point processes, such as violent crime, police stops, and use of force, while preserving as much spatial and event-level information as possible and without specifying a causal direction that is implied by a case-control setup.
There are many possibilities for the “baseline process” to compare to use of force incidents, including violent crime, arrests, and stops, all of which provide slightly different information. For this study, we focus on the relationship between police stops and police use of force but this could be expanded in the future. We develop a method to study the impact of policing datasets on police use of force, while preserving the point-level nature of both datasets.
2.3 Use of Spatial Aggregation
When studying the relationship between violent crime and use of force, the spatial level of analysis for violent crime often varies between studies. Many studies use neighborhood-level measures of violent crime, such as the number of violent crimes or homicides per police district, command area, or city per a certain number of residents (Reisig and Parks, 2003; Lawton, 2007; Lee et al., 2010; Terrill and Reisig, 2003). This method is depicted for police stops in Chicago in Figure 1, where each use of force incident is affiliated with the count of police stops per census tract. No information about the police stop is preserved, other than the census tract where it is located.
Neighborhood-level violent crime rates have limitations, as neighborhoods are not spatially homogeneous, which suggests the importance of analysis at lower levels of aggregation. To address these limitations, Lee et al. (2014) uses radial buffers to count the violent crimes within a certain radius of each use of force incident. The radial buffers range from 500 to 3,000 feet. These counts of violent crimes within a certain radius are then used as a covariate in the model, instead of a count per neighborhood. We illustrate this general method in Figure 2 where we show a subset of use of force incidents in a part of Chicago in red. The blue radial buffers are used to count the number of police stops (black) that occur within 1.5 km, in this case, from each use of force incident.
Lee et al. (2014) runs four separate multinomial logistic regression models at the four levels of radial buffers and finds that the micro-level use of place (using the radial buffers) is important when modeling the relationship between violent crime and police use of force. The results indicate that smaller radial buffers lead to a larger positive relationship between violent crime and police use of force and that larger radial buffers lead to a possible ‘blur’ of the genuine effect of violent crime on police use of force. The findings from this work show that the relationship may be lost almost altogether when looking at the neighborhood level. In total, this suggests that using the smallest level of aggregation possible decreases the potential of losing information.
Radial buffers preserve some of the granularity of the violent crime/police stop locations when determining their relationship with police use of force. We can create buffers that are smaller than census tracts and the sizes of all buffers could be uniform, as in Lee et al. (2014), or they could vary based on variables such as population density. However, this method is still a form of aggregation, and therefore we lose many of the precise details about the violent crime incidents or police stops. For example, we have to fix the radial buffer, so after the size is set we cannot know how many crimes occur within both 10 feet and 500 feet of the given use of force incident. We also cannot preserve precise point or event-level information about the violent crime incidents or police stops. These could be variables such as the race and gender of the officer and/or civilian, if a gang was involved, if a weapon was involved, and if the incident was a hate crime. When we aggregate to radial buffers, we no longer have the ability to use event-level information, such as these, for the violent crime incidents or police stops in the point process model.
3 Methods
We develop a spatial point process approach as follows. Let the random locations of police use of force incidents be a point process with intensity function . is the study region, for instance the city of Chicago. The random locations of the second point process, police stops in our application, can be related to the first point process using multiple methods. In case-control models this point process has baseline intensity function while in our shared component model, it has a second intensity function . In our shared component framework, we relate both point processes to spatial covariates, , through regression coefficients . In case-control models, only the case process, use of force incidents in our data, are related to spatial covariates.
Our aim is to relate two or more point processes while preserving the point-level information of all datasets. In what follows, we describe relevant models in the literature, mostly focusing on marked point process models and summary or test statistics. Next, we present two classes of models that we explore in detail, the first being the well-established case-control model and the second being a novel use of a shared component model for point processes. We describe a Bayesian inferential approach for relating two point processes for both classes of models.
3.1 Existing Methods to Relate Multiple or Multitype Point Processes
Multitype point processes, where the marks are categorical labels, offer one approach to relating two or more point processes. In such point processes, the mark determines the type of point at a given location. The relationship between types of points, determined by the categorical mark, is often modeled through a cross-covariance function in a bivariate point process or tests of similarities of first-order characteristics. For example, Fuentes-Santos et al. (2017) proposes a nonparametric comparison of multitype point processes based on first-order properties, which describe the spatial distribution in the area of interest. The methods are used to analyze different types of wildfires in Spain based on two characteristics: size (small, regular, large) and cause (arson, natural, negligence, reproductions, and unknown). Berman (1986) tests the relationship between a spatial point process and other spatial stochastic processes. Innovative spatial data types are used with the goal of analyzing the relationship between copper deposits (a point process) and linear features observed in the region, called ‘lineaments’; these are often roads or perceived breaks in the earth’s crust, as observed by a satellite. Berman (1986) develops a test statistic for testing the relationship between the two processes based on the distances from the points of the point process (copper deposits) to the nearest point in the stochastic process (lineament). The tests proposed by Fuentes-Santos et al. (2017) and Berman (1986) are similar to other recently developed multitype test statistics (cf. Illian et al., 2008; Møller and Waagepetersen, 2003). These tests are useful exploratory tools and can motivate further exploration through parametric models of the intensity of point processes.
Mohler (2014) uses marked point process models to create hotspot maps. The categorical mark for this point process framework specifies different crime types, including homicide and crime types that may precede homicide. Mohler (2014) determines the probability that an event triggered a given homicide event, where event could be another homicide or a different crime type, through a self-exciting point process framework. All of the crimes that are not homicides involve a handgun; these data are used to analyze the relationship between gun crimes and the prediction of future homicide. This innovative space-time approach incorporates two or more marks, which is useful when considering many marks. The approach relies on short and long-term kernel density estimates whereas we pursue a parametric approach to evaluate the relationship between two point processes. A triggering function may be a useful tool in future analysis to incorporate the temporal dimension of both point processes.
Liang et al. (2008) develops a bivariate mark point process model that allows for dependence across levels of a mark within the point process model. The model includes spatial variables and their regression coefficients, and , nonspatial variables and their regression coefficients, and , and a Gaussian process . The log spatial intensity function for mark of the point process is defined as . The mark considered by Liang et al. (2008) is cancer type, where two cancer types are considered. This model allows flexible spatial dependence across two mark levels, , through a cross-covariance function for the Gaussian processes, . Kelling and Haran (2022) find that interpretation is difficult due to the use of nonspatial variables in the spatial intensity function. Furthermore, analysis of the dependence between the two mark levels is limited to the parameter of the dependent Gaussian processes, with cross-covariance function , defined below. The parameter gives information on the strength of the dependence in the spatial residual pattern, but we aim to analyze the relationship between two point processes in more detail.
[TABLE]
3.2 Case-Control for Point Processes
Case-control methods represent a common class of models used to compare two point processes. Guan et al. (2008) use case-control point process methods to relate a 1990 survey of birds (the baseline/control process) to a more recent survey in 2004 (the main/case process). The results show that the spatial distribution of golden plovers have changed over time in relation to spatial covariates, namely slope and cotton grass coverage. Diggle et al. (2007) use case-control methods to study the relationship between juvenile and adult trees in a tropical rain forest. Juvenile trees are treated as the controls, which represent underlying environmental variation, and the adults are the cases whose spatial distribution is impacted by spatial conditions affecting survival, such as elevation. Guan et al. (2010) also study the relationship between trees in tropical rain forests, where dead juvenile trees are treated as cases and new trees are treated as controls in order to study the mortality of juvenile trees. Spatial covariates used in this analysis include altitude, slope, and information on the soil content. The second example studied by Guan et al. (2010) includes the golden plover data from Guan et al. (2008) with spatial variables altitude, slope, and percent cover of heather and cotton grass. Diggle et al. (2000) and Chetwynd et al. (2001) extend case-control methods to matched case-control data, where a set of matched controls are assigned to each case based on potentially relevant confounders.
To introduce the notation for case-control methods, the intensity function for the main point process of interest at location , called the case process, is denoted . The intensity for the baseline process at location , or the control, is denoted . Spatial variables are denoted with corresponding regression coefficients and the intercept term is denoted . The full intensity function for location used in case-control point processes is as follows (cf. Diggle et al., 2007):
[TABLE]
A slightly different parameterization is often used where the intercept term scales the overall intensity function as follows: (cf. Diggle, 1990; Diggle and Rowlingson, 1994; Guan et al., 2008). In this formulation, is the intensity of the population at risk and is a scaling parameter which represents the prevalence of the cases relative to controls and is often not considered to be of primary scientific interest (Diggle and Rowlingson, 1994). The remaining portion of , , determines the elevated risk of the cases as a function of spatial variables. In these case-control models, no parametric form is assumed for the baseline intensity
In many cases direct parametric estimation of the full spatial intensity function of the case point process is avoided because of complications introduced through estimation of the baseline intensity . Guan et al. (2008) estimates the pair correlation function for the main/case process in order to characterize the second-order structure, for example clustering, of the main process after accounting for the baseline process. Guan et al. (2010) develop both nonparametric and parametric ways to study the second-order structure of the main point process. Methods are proposed to estimate regression coefficients for the main process without estimation of the control intensity . The simulation study fixes regression coefficients, , but does not evaluate the ability of the estimation procedure in recovering the regression parameters. Rather, the simulation study is focused on the bias and standard deviation of the estimators for the pair correlation function in order to evaluate clustering. We also note that independence is assumed between the case and the control process. Diggle and Rowlingson (1994) also avoids estimation of through the use a non-linear binary regression model to estimate .
Some case-control methods pursue nonparametric estimation of the baseline intensity function through kernel density estimates which are then used to estimate the intensity of the case process, (Diggle et al., 2000, 2007). This is useful when we would like to visualize and interpret the full spatial intensity function, . Diggle et al. (2007) proposes a simple estimation procedure for the regression coefficients in the case-control model without estimation of the control intensity, . Coefficients and are estimated through logistic regression where is binary and indicates whether a given location is a case (1) or control (0). The logistic regression takes the following form: . If it is desired to estimate the case intensity, , the control intensity is estimated through a kernel density estimate. The results from logistic regression and the kernel density estimate combined create the estimate of the intensity function . Diggle et al. (2007) interprets regression coefficients as the effects of on the relative intensity of cases to controls at location . The intercept is interpreted as the “chosen ratio” between cases and controls.
Hessellund et al. (2021) considers a multivariate extension of the case-control logistic regression model illustrated in Diggle et al. (2007). Importantly, this multivariate case-control model does not assume independence between the various point processes. Hessellund et al. (2021) analyzes six types of street crimes in Washington DC: Robbery, Auto Theft, Vehicle Theft, Assault, and Burglary, and Other Theft. The intensity for each point pattern is defined as . Spatial covariates are defined at location with regression coefficients . The background intensity is defined as and is interpreted as the “spatial effects of latent factors such as the urban structure and population density and is assumed to be common for all point types” (Hessellund et al., 2021). The parameters are shown to be not identifiable so regression coefficients are defined so that where is the baseline point process. The regression coefficients for the baseline point process, are set to 0 so that is the intensity of the baseline process. In the application to the Washington DC street crimes, the ‘Other Theft’ category of crimes is set to be the baseline category and all other categories are compared to this point process.
Hessellund et al. (2021) once again avoids estimation of the baseline intensity when estimating regression coefficients through the use of multinomial logistic regression, similar to that of Diggle et al. (2007), which does not depend on . For interpretation of the Washington DC street crimes, Hessellund et al. (2021) plots conditional probability maps based on this multinomial logistic regression. In order to plot , the authors then estimate through a kernel estimate.
Xu et al. (2019) introduce a modified version of the case-control model where a spatially varying control process is scaled by a sampling scheme, , which is often assumed to be known. For the analysis of restaurants in Beijing in Xu et al. (2019), a uniform 6% sampling rate is assumed for the controls. The intensity for controls, , and the intensity for cases, , are both dependent on the process . Once again, the estimation of is avoided due to the strategic use of the proportional intensity functions to estimate regression coefficients. The term is denoted an “infinite-dimensional nuisance parameter” due to concerns over inconsistent estimation of and the effect on inference for regression coefficients. Independence is assumed between the case and control processes.
In our analysis of simulated examples and policing data from Chicago, we consider two methods of estimation for the case-control model: logistic regression and as a spatial intensity function for a nonhomogeneous Poisson process. All models in our paper are implemented through a Bayesian hierarchical framework, except the case of logistic regression, which is estimated through maximum likelihood and the glm function in R.
3.3 Shared Component Model for Point Processes
We develop a shared component model for point processes, building upon the areal data framework in Knorr-Held and Best (2001). Similar to case-control models, we scale the intensity function by a term, . However, in our case, this shared component is weighted by the parameter and contributes to both point processes, rather than representing the intensity of one point process. This shared component is inferred from the data, rather than calculated from nonparametric methods or ignored, as is often the case with the control process in case-control methods. Knorr-Held and Best (2001) study the spatial distribution of two different types of cancer, oral cavity and oesophagus, which have been shown to have common and unique risk factors. Similarly, we would like to infer the shared spatial pattern between multiple types of policing data, including police use of force, police stops, and violent crime incidents.
Our shared component model is shown in Equation 2, where denotes the spatial intensity of one point process, and denotes the spatial intensity of the second point process. The spatial variables for each point process, , and their corresponding regression coefficients, and allow us to gain knowledge about the spatial pattern that is unique to each individual spatial pattern, rather than shared between them. For example, we may assume that population effects may largely be captured by the shared component, , while other neighborhood effects may be unique to each point process. From our simulation studies, we have found that we can include the same spatial variables in both intensity functions and recover the corresponding regression coefficients accurately, although this should be tested in more detail for each application and context.
[TABLE]
There are many variations that could be considered of the shared component model presented in Equation 2. The simplest version of the model is to consider a shared component that does not vary over space, or for all . Through simulation studies, we have found that in this case we cannot include intercept terms in the intensity for either point process, as they are confounded with the shared component which is essentially a shared intercept term between the two point processes. For our analysis, we focus on a spatially varying shared component, where is a Gaussian process. We estimate the Gaussian process using predictive process transformations due to the computational burden of estimating an covariance matrix (cf. Banerjee et al., 2008). We estimate the Gaussian process over a small number of knots and then transform this Gaussian process to the data points using the covariance function between the knots and the data points. For our study, we use 82 knots evenly distributed over the region, as shown in Figure 7. The Gaussian process over the knots () is transformed to the estimate of the Gaussian process over the data points () using the covariance function between the knots and between the points and the knots, as shown below.
[TABLE]
There are also multiple choices for the parameterization of the weighting parameter, . We focus our studies on two parameterizations: weights of and for each point process, as shown in Equation 2, and weights of and for the two point processes, respectively. In the first case, is bounded between 0 and 1 and we use a Uniform(0,1) prior for . Other choices of bounded distributions, such as a Beta distribution, could also be used. In the second case, is not bounded between 0 and 1. The log of has a Normal prior distribution, as suggested by Knorr-Held and Best (2001). Overall, we find that the parameterization with the shared component weights being and produce reliable parameter estimates and easier interpretation than the second point process being weighted as . We note that we have constrained in both examples so that the shared component has a positive contribution to both point processes (). Although the shared component must be positive, other parameterizations of could include a negative contribution of the shared component to the point process intensity function.
The likelihood for the shared component model is included below. To estimate this likelihood, we must estimate the integral of the individual point process intensity functions, and , over the region of interest, . We estimate this integral using Monte Carlo averages of the values of each intensity function over integration points, described in Section 8.1. The intensity functions for both point processes are then multiplied over all of the data points for each point process ( points of the first point process, points of the second point process).
[TABLE]
Finally, we could consider different parameterizations of the part of the intensity function that is specific to each point process. For example, we could consider different combinations of spatial variables across both intensity functions, including different or identical variables between point processes. This is an advantage of our shared component method; case-control methods do not incorporate parametric estimation of the intensity of the control process. In our parameterization shown in Equation 2, we also show a nonhomogeneous Poisson process (NHPP) for both point processes for the part of the intensity function that is not shared between both point processes. Other forms of this intensity function could be explored in the future, such as forms of a log Gaussian Cox process. In this case, care must be taken to avoid confounding between the shared component and the Gaussian process specific to the point processes. We note that in our framework of NHPP’s for both frameworks, we find the most reliable estimation when an intercept term, , is only included for one of the point processes, not both.
Our shared component model does not assume one process is a “baseline process” whereas the other process is the “case” or “main” process. This has many advantages, especially when the direction of causality is not known. In regards to policing, for example, many studies shown above have said that violent crime provides context of neighborhoods, which may impact policing behavior. However, there are also policing studies, such as the Broken Windows theory, that posit policing behavior can also impact the prevalence of violent crime. Specifically, the Broken Window theory states that if “soft” crimes are tolerated by police, more criminals may commit crimes in that given area (Wilson and Kelling, 1982). Increased police presence in one neighborhood has also been associated with the spatial displacement of crimes to other areas (cf. Ratcliffe, 2002). Given that policing behavior may affect crime and crime may affect policing behavior, our shared component model is advantageous in this analysis. We note that the shared component model can be written as a version of the case-control model where the scaling process now depends on the intensity function of the second process, the parameter defined above, as well as the spatial covariates and coefficients for the second point process. This is discussed in more detail in Section 8.4.
We describe two interesting case-control based approaches and compare their work to the shared component model that we develop. In the example of tropical rain forest data provided by Xu et al. (2019), there is not an apparent control process available to study the three tree species. Therefore, a homogeneous Poisson process is used as the control process and all three tree species are treated as cases compared to the uniform control. Xu et al. (2019) note that the size of the homogeneous Poisson process, determined by a varying where , actually has an effect on coefficient estimation for the case processes. The shared component model that we develop here avoids specifying one process as the control process. In the simulated case-control example presented by Guan et al. (2010), the authors assume the control intensity takes the following form: . The case process intensity is assumed to take the following form: The process is assumed to be known, while is assumed to be unknown. The term is similar to an unweighted shared component between the two point processes from our shared component framework.
4 Data
We evaluate the case-control and shared component methods through simulated data as well as policing datasets from Chicago. In this section, we discuss the generation of our simulated datasets and the policing datasets from Chicago.
4.1 Simulated Data
We use the process of spatial thinning to generate all of the simulated point processes in this paper. First, we simulate a homogeneous Poisson process over the spatial window corresponding to the area of the window and the maximum possible intensity over the spatial window. The point process is then thinned and the probability of keeping a point is equal to the intensity at that point divided by the maximum intensity in the region. The spatial intensity functions are dependent on one or more spatial variables, which are depicted in Figure 3. All spatial variables and point processes are simulated over the unit square.
For case-control models, we simulate a baseline/control process using a parametric intensity function, in order to give the point process a nonhomogeneous distribution over space. We denote this parametric intensity function in Equation 1. The intensity of the first point process, , and spatial variables are used to simulate a second point process based on its full spatial intensity , as shown in Equation 1. Importantly, we simulate the case-control data using a spatial intensity model specification, not using logistic regression, so we expect the spatial intensity function to perform well when recovering parameters.
For the shared component models, the two point processes are simulated simultaneously instead of sequentially. First, we simulate a Gaussian process to serve as the log of the shared component, so that the shared component is always greater than zero. We denote where is a univariate exponential covariance function such that the covariance for two locations is defined as , with . The parameter is fixed in our simulation studies and real data application and is not estimated from the data, as is common in other studies due to identifiability issues (cf. Liang et al., 2008). We calculate the value of so that the 95th percentile of distances between all points would have a a correlation of 0.05, and the value of so that the 5th percentile of distances would have a a correlation of 0.95. We fix at the average of these two values. The parameter is also fixed in simulation studies but then is treated as unknown when estimating model parameters. After simulating the Gaussian process, we fix the weight parameter , which decides the contribution of the shared component to each of the two point processes, as well as the regression coefficients for each point process. Finally, we use spatial thinning to generate the two point processes using their corresponding spatial intensity functions, as shown in Equation 2.
4.2 Chicago Policing Data
Next, we describe the police use of force and police stop datasets used in this study. Ba et al. (2021a) acquired detailed data from the Chicago Police Department through the use of open-records requests and appeals. We utilize the replication data posted through the Code Ocean capsule (Ba et al., 2021b). Both the use of force data and the police stop data cover 2012-2015. We note that there is a large number of events that do not have coordinates (latitude/longitude) given for the events. Specifically, 18.8% of use of force events and 14.6% of police stops do not have coordinates available. In this analysis, we remove these points, but future work may require additional investigation into the missingness of these points.
The use of force dataset from the Chicago Police Department includes 9,293 incidents from January 2012 through December 2015, 7,539 of which have complete location information. The data also includes information about individuals involved in the event such as civilian race, gender, age, and injury status as well as the officer ID. The dataset of police stops includes 1,703,158 incidents from January 2012 through December 2015, 1,453,832 of which have complete spatial information. The dataset also includes information such as the type of stop, the civilian race, gender, and age, and the officer ID. We plot both datasets in Figure 4 on the areal level, for ease of visualization. We notice that the two outcomes share a spatial pattern, where there are smaller counts on the border of the city and larger counts in the southern center of the city. Note that we have only plotted this data on the areal level for ease of visualization for hundreds of thousands of points- our analysis is on the point-level in continuous space.
In addition to the two policing datasets analyzed for Chicago, we also collect socioeconomic information from the US Census and the American Communities Survey. We use the census tracts that are completely contained within the police beats, the latter of which were downloaded from Chicago’s Open Data Portal. We focus our attention on three socioeconomic variables gathered or calculated from the census data: median age, unemployment rate, and the Herfindahl Index for neighborhood diversity. We plot the Herfindahl Index for Chicago census tracts in Figure 4. We note that the models we introduce here are flexible to difference choices of covariates, depending on the application and research questions.
5 Results
We compare the case-control and shared component models using both simulated datasets and real policing data from Chicago. We test many different parameter settings for the case-control model to illustrate that commonly used estimation methods should be interpreted with caution. We also apply the shared component method to simulated data and policing data from Chicago and find that it provides a flexible framework to analyzing the relationship between two point processes.
For estimation of all parameters, we use Markov Chain Monte Carlo (MCMC) implemented through the NIMBLE package in R (de Valpine et al., 2017). Our approach relies on Metropolis-Hastings adaptive random-walk samplers with univariate normal proposal distributions for the regression coefficients (Normal(0,100)) (de Valpine et al., 2017). We use an Inverse-Gamma(, ) prior for , the parameter associated with the covariance function of the shared component, as in Liang et al. (2008). We assess convergence through trace plots, effective sample size (Gong and Flegal, 2016), and Monte Carlo standard error (Haran and Hughes, 2020; Flegal et al., 2021).
5.1 Results for Simulated Data
Through simulation studies, we have found that the case-control model presents some challenges with interpretation depending on model assumptions. We simulate from the case-control model as a spatial intensity function with many different simulated parameter settings for the regression coefficients. To do this, first we simulate data from one point process. We proceed to simulate a second nonhomogeneous Poisson process (NHPP) based on the spatial intensity function described in Equation 1, using the true intensity function for the baseline point process, . In practice, we often do not know the functional form for so we must estimate it using a kernel density estimate (KDE) or avoid its estimation altogether. We discuss two possible estimation procedures below.
After simulating the case and control NHPPs from the spatial intensity functions corresponding to the case-control model, we use two estimation methods, the spatial NHPP and logistic regression, to test parameter recovery for the case intensity function. In the first case, we estimate the spatial NHPP parameters through a Bayesian approach with the following likelihood function, where is the case intensity function, which includes regression coefficients and and the control intensity, . We use independent mean 0 normal priors for the regression coefficients. We do not assume any knowledge of the baseline intensity function, and therefore use a KDE estimate of the control intensity function, , when estimating the case intensity function parameters.
[TABLE]
For the second approach, we use logistic regression as described in Section 3.2. This approach avoids the use of the baseline intensity function, , when estimating regression coefficients. If these two methods of estimating parameters from the same data result in similar parameter estimates, then we have flexibility between the estimation methods and corresponding interpretation.
As shown in Table 1, we find that when all parameters are positive, the logistic regression approach is able to recover regression parameters well. There are challenges when using the NHPP approach to estimate regression parameters, due to the use of a KDE estimate of the baseline intensity function. When we include negative parameters, as shown in the second simulated example, we are not able to recover the intercept parameter well with logistic regression and some other parameter estimates are also impacted. We test the sensitivity of these results to scaling (for example, using an intercept parameter of -1 instead of -10) and still find challenges in estimating using logistic regression. We include the credible intervals for the NHPP estimated using MCMC and confidence intervals for the logistic regression, estimated through the glm function in R. From these results based on simulated data, logistic regression presents a promising alternative to estimation with a NHPP but can still present challenges in some settings.
In a second set of simulated examples, we assume we have some knowledge of the baseline intensity function and test if the NHPP method can accurately estimate parameters with these assumptions. We simulated data from the control process with intensity function and the case process with intensity function . We assume that we know the form of the baseline intensity function and estimate the regression coefficients corresponding to both the case and the control intensities. We note that the intercepts are not identifiable and we estimate the sum of the intercept parameters for both point processes. In Table 2, we show that when we assume we know some information about the structure of the baseline intensity, we are able to estimate the coefficients accurately using the NHPP approach. In practice, specifying a structure for the baseline intensity may not be possible.
From these findings, we suggest caution when interpreting results from both the NHPP and logistic regression estimation methods Diggle et al. (2007) if the assumption is that these results should be interpreted as a spatial intensity function. We find that these two estimation procedures can produce different results, so it is important to consider the desired interpretation of these parameters. Logistic regression has many advantages, including avoidance of estimating the baseline intensity function when estimating regression coefficients and computationally efficient estimation. We have shown some advantages of this approach when compared to the full Bayesian estimation of the NHPP intensity function. We have also shown an advantage of the NHPP method when we are able to assume some structure for the baseline intensity function, which may not always be possible. These advantages of both methods should be carefully considered alongside estimation and interpretation abilities and goals.
Next, we evaluate the shared component model proposed in this paper with simulated data examples. We consider two cases for the distribution of the weight parameter . In the first case, the shared component in the intensity of the first point process is weighted with and the shared component in the intensity of the second point process is weighted with . In this case, we use a Uniform(0,1) distribution as the prior for . In the second case, the contribution of the shared component to the intensity of the second point process is weighted by , instead of , and the prior on is a Normal distribution. The choice of the Normal distribution for the second case is motivated by the use of this distribution by Knorr-Held and Best (2001) in the shared component model for areal data.
From the simulated cases of the shared component model, we find that the shared component model with the Uniform prior for the weighting parameter provides simple interpretation and more reliable parameter estimates. We simulate from both of these weighting schemes and analyze the posterior estimates of the parameters. In Appendix Section 8.2 we include plots of these point processes which include identical parameters and different distributions. In Table 3, we see that the credible interval for all parameters contains the true parameter for the case when and are used as the weights of the shared component. In the case when and are used, some parameters, namely the parameter (the shared component covariance function parameter) and (the intercept of the second point process) are not recovered accurately through the simulation studies. Therefore, we suggest use of the Uniform prior and the weighting scheme of and be used for the shared component model.
When analyzing simulated data for the shared component case, we also note that when two intercepts are used ( for both point processes), there is confounding between the two parameters and they cannot be estimated accurately. Therefore, we use one intercept term for one of the point processes and omit the intercept from the other point process. We also note that we tested the inclusion of identical spatial covariates in the intensity functions for both point processes and we were able to recover accurate parameter estimates even when identical spatial covariates were included in the intensity function for both point processes, though this should be investigated in more detail. In Figure 5, we compare the true shared component, generated through simulation, to the posterior mean estimate of the shared component using the Uniform prior and weights of and . We find that we are able to recover the shared component relatively accurately.
5.2 Results for Police Stops and Use of Force in Chicago
Point-level datasets from Chicago allow us to utilize case-control and shared component models to create detailed analyses of the relationship between police use of force and police stops. Ba et al. (2021a) aggregates the policing data we study here to a panel dataset of officer shifts and aggregate spatially by police beat. A comparison of police behavior across “MDSBs” (month, day of week, shift, and beat) allows for comparison of officers of different demographic profiles but similar patrol assignments. Ba et al. (2021a) use ordinary least squares with MSDB fixed effects to determine the effect of officer and citizen characteristics on use of force outcomes. We conduct a spatial analyses of the Chicago data used by Ba et al. (2021a) but preserve the exact spatial information of both the police stops and the use of force incidents, rather than aggregating to beats. This modeling framework allows us to incorporate macroinstitutional factors, such as the decision to deploy more officers to specific neighborhoods, that has intentionally not been considered in some previous work (Knox et al., 2020a, b).
First, we fit the case-control model to the police use of force and stop data from Chicago through two estimation procedures: logistic regression and the spatial NHPP model, utilizing a KDE. We also estimate two different sets of spatial covariates, one with the Herfindahl Index and the unemployment rate as spatial covariates and the other substituting median age for the unemployment rate. We find that these two estimation procedures produce different results, as shown in Table 4. The confidence interval for the median age regression coefficient includes 0 for logistic regression and the regression coefficient is estimated to be negative using the NHPP. We include the 95% confidence intervals for logistic regression and the 95% credible intervals for the NHPP. We note that the credible intervals do not overlap in the majority of cases. We also evaluate model fit for the NHPP using WAIC and using AIC for logistic regression, where lower WAIC and AIC both indicate better model fit. We find that the model fit checks indicate the opposite model has a better fit between the two estimation procedures. Many open questions remain from this analysis of the use of force and stop data from Chicago using two estimation procedures for the same model. We find that it is perhaps safest to interpret the regression coefficients obtained from fitting a logistic regression here in terms of the factors influencing an event being a case versus being a control. This is in contrast to interpreting the results as factors influencing the spatial distribution of cases, scaled by controls.
Next, we analyze the Chicago police stop and use of force datasets using the shared component model. We adopt the approach of Xu et al. (2019) in sampling one of the point processes. We spatially thin the stop dataset due to the large number of points in this dataset, where each point has an equal probability of remaining in the dataset. In our case, we consider the thinning probability to be 10%. In the Appendix, Section 8.3, we describe this approach in detail. In our analysis, we are only interested in describing the spatial distribution and the spatial relationship between use of force incidents and police stops. If we were interested in any other characteristics of the stop data, such as the demographics of the citizens or officers involved, we would want to consider either using the full stop data or a different sampling approach that takes into account these variables of interest.
In Table 5, we include the estimated coefficients from the intensity functions for both point processes and the parameters associated with the shared component, and . We apply both parameterizations of the weight term, , described in Table 5 by the prior distribution (either Normal or Uniform). We see that the estimates between the two different parameterizations of the shared component model are similar for the covariate effects, . Both the point estimates and the credible intervals are very similar across parameterizations. We note that the parameter for both parameterizations indicates that the shared component contributes most strongly to the use of force dataset, rather than the police stop data. As in Knorr-Held and Best (2001), we assume that the shared component model is indicative of spatial variation in factors influencing both police use of force and stops, such as increased police activity or population. Therefore, we interpret for both parameterizations as showing that the shared spatial pattern, due to factors such as police activity or population, affects the distribution of police use of force events more so than police stops. The only parameter that notably differs between the two parameterizations is , from the covariance function of the Gaussian process, which is estimated to be higher in the case when is bounded between 0 and 1 with a Uniform prior.
We also would like to compare the estimated spatial shared component to the point process-specific components. In Figure 6 we plot the inferred spatial distribution for all facets of the shared component model. We note that the estimated spatial distribution for the point process-specific part of the model in this case is a nonhomogeneous Poisson process, so the estimated intensity only varies with the census tracts. This is illustrated in the middle and right plots of Figure 6. In future work, one could consider incorporating an LGCP into this framework, although there may be problems due to confounding between the spatially varying shared component and the Gaussian process unique to each point process. In Figure 6, we plot the posterior mean estimate for the shared component on the left side, transformed over the integration points used in our analysis. We see that there is a notable shared spatial pattern between the two point processes. We note that we have not used population as a spatial covariate in our model. Therefore this shared spatial component may largely be determined by population and/or increased police activity in these areas, as noted earlier. In the future, we could consider also scaling both intensity functions by population density, as is done frequently for point process intensity functions (cf. Liang et al., 2008; Walder et al., 2020). Next, we turn our attention to the spatial components unique to each point process, which indicate the existence of factors that are relevant to only police of force or police stops, but not both. We analyze the posterior mean estimates for the regression coefficients and we note that the distribution of police use of force incidents beyond the shared component is most notable on the northern side of the city, with some higher intensity spots in the southern side of the city as well. On the other hand, the police stop intensity is highest in the far south and middle parts of the city, after accounting for the shared component. This information can inform our analysis of where use of force and police stops are higher, after accounting for the other shared spatial pattern. Future research could help determine additional spatial variables and socioeconomic information that could be included in the model that would inform the factors that influence one point process, but not both. This could help identify why police stops occur more frequently in some areas further than what we would expect from police activity alone, and could answer the same question for police use of force.
6 Discussion
We have demonstrated through Chicago policing data and simulated examples that the shared component model developed here can provide new insights when relating two point processes. In practice a causal direction between the two processes is often not justified. Our model provides a mechanism to compare two point processes without specifying which process is a case or control, thereby avoiding the requirement that modelers impose a causal direction. We can fully characterize spatial patterns common to both point processes as well as those unique to each point pattern. The application to Chicago policing data provides a particularly useful context in which to tease apart spatial patterns that are common to use of force and police stops as well as patterns that are unique to each point process. Our approach involves the use of a Gaussian process, which is more computationally complex than the case-control model, but provides additional flexibility. We also suggest methods to decrease this computational burden, such as sampling one of the point process, as suggested by Xu et al. (2019).
The shared component model allows us to create a rich characterization of the relationship between two point processes. Instead of scaling one point process intensity by the intensity of another point process, we can incorporate spatial/community variables when determining the possible unique effects on these point processes, after determining drivers for both point processes, such as population. This allows us to analyze potential drivers of both point processes, rather than just one of the point processes. This model also allows for visualization of the spatial trends of both point processes, as well as drivers of both point processes that have been inferred from the data.
In this work, we focus our analysis on the comparison of the spatial distribution between two point patterns: police use of force and police stops. This framework could be expanded in the future to more than two point processes by allowing the weight parameter of the shared component, , to adapt to more than two point processes. This analysis does not attempt to analyze the relationship between officer and citizen characteristics and the spatial patterns of police behavior, though this could be achieved by combining this method with existing methods, such as those developed by Kelling and Haran (2022). We also note that the shared component could take forms other than a Gaussian process, for instance by using a clustering model (Knorr-Held and Best, 2001).
7 Acknowledgements
The authors would like to thank Professor Peter Diggle and Professor Ephraim Hanks for helpful conversations that greatly improved the manuscript. This project was supported by Award No. 2020-R2-CX-0033, awarded by the National Institute of Justice, Office of Justice Programs, U.S. Department of Justice. The opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect those of the Department of Justice.
8 Appendix
8.1 Integration Points
Unlike other urban areas, Chicago has many very small census tracts. In order to accurately integrate the intensity function for Chicago, we must include at least one point per census tract and the number of integration points per census tract must be proportional to the area of the census tract, so that all census tracts are weighted by their area when integrating the intensity function over the region (cf. Liang et al., 2008). We use a total of 7,304 integration points for Chicago. Alternative parameterizations, such as the number of points being proportional to the square root of the area, create inaccurate estimates of the integral for Chicago, due to the numerous small census tracts. We have shown the integration points as well as the knots used for the predictive process for the spatially varying shared component in Figure 7.
This approach of having the number of integration points per census tract being exactly proportional to the area is particularly important for Chicago, where the difference in area between the smallest census tracts is quite extreme. For an example, we compare Dallas, Texas and Chicago, Illinois, both of which contain a range of sizes for census tracts. The largest census tract in the Dallas city limits is 115 times bigger than the smallest census tract. For Chicago, the largest tract is 3,573 times bigger than the smallest tract. We also analyze more typical values, rather than the extremes, through the first and third quartiles of the census tract areas. The third quartile of the Dallas census tracts is 2.8 times bigger than the first quartile while the third quartile of the Chicago census tracts is 5.3 times bigger than the first quartile.
8.2 Shared Component Weighting Distribution
We simulate two point processes using identical parameters, as shown in Table 3, but different weighting schemes and distributions. These are based on spatial covariates shown in Figure 3. In magenta in Figure 8, we show the point process that results from the weight contributions and to the two point processes, with a Uniform(0,1) prior for the parameter. In blue, on the bottom of Figure 8, we show the two point processes that result from the weights and with a Normal prior for the parameter. In both simulations, was set to be 0.3.
8.3 Sample of Chicago Stops
For the shared component model, we must transform the shared component (involving a Gaussian process) to be the dimension of both the stop data and the use of force data. The police use of force data has 7,539 events with complete spatial information, which is manageable computationally. The stop data, on the other hand, contains 1,453,832 events with complete spatial information. This represents a computational challenge in terms of large matrix computations. We notice that if we perform uniform thinning across all points, the spatial distribution of points does not change much between the full stop data and the thinned data. In Figure 9, we show the kernel density estimate (KDE) for the full stop data next to the thinned stop data, where the probability of keeping each point is 10%. The spatial thinning results in a dataset of 145,783 points. We notice that the KDE looks almost identical between the two point processes, except for the scale, shown to the right of the plot. We adopt a similar approach to Xu et al. (2019) where we adopt sampling of one of the point processes. In future work, when we are interested in analyzing different features of the stops, we may want to consider the full point process or a more complex sampling mechanism that samples based on other variables as well.
8.4 Relationship Between Case-Control and Shared Component Model
In this section, we elaborate on the relationship between the case-control model (cf. Diggle et al., 2007) and our shared component model for point processes. The case-control model defines the intensity of a case process, , based on the intensity of a control process, , as follows
[TABLE]
As a reminder the shared component model, shown below, describes the intensities of two point processes separately (denoted and ) but relies on a process that contributes to the intensity of both processes, which we have denoted .
[TABLE]
This structure of a shared process contributing to both intensities allows us to write the shared component model as a version of the case-control model.
Our goal is to determine the parameterization of the ‘scaling process’ for the shared component model, which is simply the control process for the typical case-control model. In Equation 5, we rewrite the intensity for the first point process in the shared component model to show the scaling process under the structure of the case-control model. We do this by solving for the shared component that contributes to both point processes, , in the intensity of the second point process and substituting this in the intensity of the first point process. We denote spatial covariates as and their corresponding regression coefficients as for the first point process and spatial covariates as and regression coefficients as for the second point process. Instead of purely being an estimate of the second spatial intensity function as in the case-control model (), now this scaling process for the re-parameterized shared component model depends on the second spatial intensity, the covariates and coefficients for the second spatial intensity, and the parameter which defines the contribution of the shared component to both point processes.
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Antonovics and Knight (2009) Antonovics, K. and Knight, B. G. (2009) A new look at racial profiling: Evidence from the Boston Police Department. The Review of Economics and Statistics , 91 , 163–177.
- 2Ba et al. (2021 a) Ba, B. A., Knox, D., Mummolo, J. and Rivera, R. (2021 a) The role of officer race and gender in police-civilian interactions in Chicago. Science , 371 , 696–702.
- 3Ba et al. (2021 b) — (2021 b) The role of officer race and gender in police-civilian interactions in Chicago [Source Code]. URL: https://doi.org/10.24433/CO.4519202.v 1 . · doi ↗
- 4Banerjee et al. (2008) Banerjee, S., Gelfand, A. E., Finley, A. O. and Sang, H. (2008) Gaussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 70 , 825–848.
- 5Berman (1986) Berman, M. (1986) Testing for spatial association between a point process and another stochastic process. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 35 , 54–62.
- 6Chetwynd et al. (2001) Chetwynd, A. G., Diggle, P. J., Marshall, A. and Parslow, R. (2001) Investigation of spatial clustering from individually matched case-control studies. Biostatistics , 2 , 277–293.
- 7de Valpine et al. (2017) de Valpine, P., Turek, D., Paciorek, C., Anderson-Bergman, C., Temple Lang, D. and Bodik, R. (2017) Programming with models: writing statistical algorithms for general model structures with NIMBLE. Journal of Computational and Graphical Statistics , 26 , 403–413.
- 8Diggle (1990) Diggle, P. J. (1990) A point process modelling approach to raised incidence of a rare phenomenon in the vicinity of a prespecified point. Journal of the Royal Statistical Society: Series A (Statistics in Society) , 153 , 349–362.
