Filaments of crime: Informing policing via thresholded ridge estimation

Ben Moews; Jaime R. Argueta Jr.; Antonia Gieschen

arXiv:1907.03206·stat.AP·November 16, 2022

Filaments of crime: Informing policing via thresholded ridge estimation

Ben Moews, Jaime R. Argueta Jr., Antonia Gieschen

PDF

1 Repo

TL;DR

This paper introduces a novel method using density ridge estimation to identify crime hot spots, improving patrol efficiency and coverage, with demonstrated application on Chicago crime data and an open-source software tool.

Contribution

It adapts and extends the subspace-constrained mean shift algorithm for geospatial crime data, providing a new approach for hot spot analysis and patrol optimization.

Findings

01

Ridges align with broader kernel density estimates.

02

Patrol templates cover around 94% of incidents within 0.1 miles.

03

The method offers stable, data-driven routes for crime prevention.

Abstract

Objectives: We introduce a new method for reducing crime in hot spots and across cities through ridge estimation. In doing so, our goal is to explore the application of density ridges to hot spots and patrol optimization, and to contribute to the policing literature in police patrolling and crime reduction strategies. Methods: We make use of the subspace-constrained mean shift algorithm, a recently introduced approach for ridge estimation further developed in cosmology, which we modify and extend for geospatial datasets and hot spot analysis. Our experiments extract density ridges of Part I crime incidents from the City of Chicago during the year 2018 and early 2019 to demonstrate the application to current data. Results: Our results demonstrate nonlinear mode-following ridges in agreement with broader kernel density estimates. Using early 2019 incidents with predictive ridges…

Tables1

Table 1. Table I: Part I crime incident numbers for Chicago during the year 2018. Different primary crime types are listed separately, with entries descending by the number of reported incidents.

Primary crime type	Number of data points
Larceny-theft	$42, 423$
Aggravated assault	$13, 843$
Burglary	$7, 821$
Motor vehicle theft	$6, 641$
Robbery	$6, 525$
Forcible rape	$1, 013$
Criminal homicide	$386$
Arson	$242$

Equations8

\hat{β} = \frac{1}{k ∣ θ ∣} i = 1 \sum ∣ θ ∣ j = 1 \sum k \textarc d (θ_{i}, θ_{j})

\hat{β} = \frac{1}{k ∣ θ ∣} i = 1 \sum ∣ θ ∣ j = 1 \sum k \textarc d (θ_{i}, θ_{j})

\textarc d_{hav} (θ_{1}, θ_{2}) = hav (θ_{2, 1} - θ_{1, 1} + cos θ_{1, 1} cos θ_{2, 1} hav (θ_{2, 2} - θ_{1, 2}))

\textarc d_{hav} (θ_{1}, θ_{2}) = hav (θ_{2, 1} - θ_{1, 1} + cos θ_{1, 1} cos θ_{2, 1} hav (θ_{2, 2} - θ_{1, 2}))

ϕ_{n, i} = v^{'} v^{'⊤} \frac{\sum _{j = 1}^{∣ ψ ∣} σ _{j} θ _{j}}{\sum _{j = 1}^{∣ ψ ∣} σ _{j}} - ψ_{i}

ϕ_{n, i} = v^{'} v^{'⊤} \frac{\sum _{j = 1}^{∣ ψ ∣} σ _{j} θ _{j}}{\sum _{j = 1}^{∣ ψ ∣} σ _{j}} - ψ_{i}

ψ^{'} = \hat{ψ} \in ψ : KDE_{RBF} (\hat{ψ}, β) \geq γ, with γ = min (sort_{desc} (KDE_{RBF} (ψ, β))_{1, 2, \dots, ⌊ \frac{p}{100} ∣ ψ ∣ ⌋})

ψ^{'} = \hat{ψ} \in ψ : KDE_{RBF} (\hat{ψ}, β) \geq γ, with γ = min (sort_{desc} (KDE_{RBF} (ψ, β))_{1, 2, \dots, ⌊ \frac{p}{100} ∣ ψ ∣ ⌋})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

moews/dredge
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\ch@ckobl

date

∎

11institutetext: Ben Moews 22institutetext: Institute for Astronomy, University of Edinburgh

Royal Observatory, Edinburgh, EH9 3HJ, UK

22email: [email protected] 33institutetext: Jaime R. Argueta, Jr. 44institutetext: School of Criminal Justice, University of Cincinnati

2600 Clifton Ave, Cincinnati, OH 45221, USA

44email: [email protected] 55institutetext: Antonia Gieschen 66institutetext: Business School, University of Edinburgh

29 Buccleuch Place, Edinburgh, EH8 9JS, UK

66email: [email protected]

Filaments of Crime: Informing Policing via Thresholded Ridge Estimation

Ben Moews

Jaime R. Argueta

Jr

Antonia Gieschen

Abstract

In this study, we investigate the potential for optimizing hot spot patrol routes through density ridge estimation. We explore the application of an extended version of the subspace-constrained mean shift algorithm by using 2018 and 2019 Part I crime data from Chicago. Ultimately, the goal of mapping hot spots is to show concentrations of crime, thus targeting the epicenters only focuses on one problem area. For this reason, we refine patrol optimization to focus on the critical ridges in hot spots. In doing so, we extract density ridges of 2018 to early 2019 Part I crime incidents from Chicago to demonstrate that nonlinear mode-following ridges agree with broader kernel density estimations. We create multi-run confidence intervals and show that our patrol templates cover around 94% of incidents for 0.1-mile envelopes around ridges, and deliver evidence that ridges following crime densities enhances the efficiency of patrols. Our post-hoc tests show the stability of ridges, thus offering an alternative patrol route option that is effective and efficient.

Keywords:

Density Ridge Estimation, Patrol Routes, Optimized Patrols, Hot Spots

MSC:

62G07 62H11 62P25

1 Introduction

Investigations of hot spot policing tactics find that focused efforts on problem areas, such as staying at a block, effectively reduce crime Braga et al (2014); Corsaro et al (2019). Related research finds that 15 minutes of police presence in a given hot spot significantly decrease both calls for service and Part I crimes Koper (1995); Telep et al (2014). These practices suggest that the stability of crime places allows for the optimization of tactics. Specifically, one way to optimize patrol routes is to a focus on the spatial aspects of hot spots, for example targeting streets and assigning prevention resources to them Camacho-Collados and Liberatore (2015).

We explore the optimization of hot spot patrols by identifying density segments as targets for crime prevention. Hot spots have high-density centers depending on the parameters defined by the analyst (pp. 356–357 ff. of Eck and Guerette, 2012). Practitioners and police officers rely on spatial analytics to identify hot spots that dictate their patrols, often over-emphasizing the epicenter’s value. This emphasis potentially over-patrols the core area and under-patrols the surrounding areas Eck et al (2005). Therefore, we propose and investigate an optimized patrol algorithm that identifies crime ridge densities in the surrounding hot spot to allow for a spread of patrol.

Previous scholars explored patrol route optimization through a variety of techniques, including multi-agent-based simulations Fukunaga and Hostetler (1975), machine learning Li et al (2011); Marchant et al (2018), and graph theory and evolutionary computing Chawathe (2007); Al Boni and Gerber (2016). These studies consistently show that route optimization is a feasible task, and account for resources and time. The primary issue among these methods is their limited application, as more complex approaches do not equate to effectiveness. Covering each street and hot spot by spending little time between places may lead to hot spots not meeting the required dosage or frequency of visits Kringen et al (2017). In turn, these patrols may have limited effectiveness or backfire Linning and Eck (2018).

Both practices, hot spot patrols and patrol optimization, tend to focus on the hot spot’s epicenter. Thus, the mismatch between the two bodies creates a gap in efficiency and effectiveness. This mismatch demonstrates three problems. First, patrol algorithms’ implementations fixate on a single spot for hot spot patrols Eck et al (2005). Secondly, common practices of identifying hot spots lack patrol direction Chainey et al (2008a); Ratcliffe (2010). Thirdly, patrol optimization algorithms propose a comprehensive list of all routes to be covered, thus under-patrolling areas.

In this paper, we suggest a way to bridge this gap. Recently, advances in statistics to perform density ridge estimation have enabled the construction of ridges that follow high-density areas, or modes, of a distribution and allow for higher-dimensional extensions Ozertem and Erdogmus (2011); Chen et al (2015a). In effect, this means the extraction of curvilinear structures, or ‘filaments’, that show high-density pathways reflecting an underlying distribution. As such, density ridges are different from mode-finding hot spot approaches, offering the identification of a connected network while identifying finer-grained structures less prone to oversmoothing risks Genovese et al (2014).

For this reason, the present study is exploratory. We seek to address the previous shortcomings of patrol optimization by applying methods from neighboring disciplines. We focus on and extend the subspace-constrained mean shift (SCMS) algorithm Ozertem and Erdogmus (2011) in order to introduce the concept of density ridges to the field of criminology. The identification of ridges will allow law enforcement to efficiently patrol routes in hot spots and surrounding areas. Thus, we contribute to the greater literature by uniting patrol optimization work and ridge estimation in hot spots to select patrol routes. The introduction of density ridges demonstrates advantages by offering efficiency as well as more equitable and focused patrols through the inclusion of finer-grained information on the density landscape. We show that these density ridges cover more problem segments in hot spots compared to hot spot policing or placing police personnel at single locations, and thus are an effective tool more suitable for crime prevention patrol.

We make use of Chicago Part I crime incident data from 2018 to develop and illustrate the application of ridge estimation through computational experiments. Additionally, we use data from January to May 2019 to test for predictive accuracy in coverage, as well as for convergence consistency, with multi-run confidence intervals and additional experiments for alternative method comparisons. Chicago offers an ideal data set that allows for mapping and testing, has a typical urban street network, and provides plentiful crime data. Thus, this paper’s empirical work assesses the potential of patrol optimization in urban cities while going beyond current good policing practices.

2 Literature review

2.1 Hot spots

Over the past two decades, scholars have confirmed that large numbers of calls for service concentrate within 3–5% of a given city Sherman et al (1989); Sherman and Weisburd (1995); Braga et al (2014). Related research finds that hot spots chronically persist for longer than a decade in 5% of block-long street segments Weisburd et al (2004). Since then, the ‘Law of Crime Concentration’ was coined Weisburd (2015), which refers to the concept where crime concentrates in specific small areas of any city or year. Later works find that hot spots vary in size for different types of crimes, for example gun-related crime Braga et al (2010), robberies Braga et al (2012), and other major crimes Haberman (2017).

Findings from these studies enable researchers and practitioners in two ways. The first is testing techniques on stable hot spots and investigating which policing strategies can be the most effective. Patrolling hot spots is reported to not disperse crime to neighboring geographic locations Braga (2007). Instead, deterrent effects are diffused to nearby streets, making efforts in patrolling hot spots a successful endeavor Braga et al (2014). Recently, research raises issues with the amount of patrolling in hot spots, pointing out a possible hermetic effect Linning and Eck (2018). The latter work suggests that if a hot spot does not meet a specific dosage of patrol presence, there may be an increase in crime. Thus, under-policing or even over-policing areas can backfire and result in increases of criminal activity. Similarly, scholars argue that patrols should focus on fewer visits of longer duration at hot spots rather than hitting them randomly and often Williams and Coupe (2017).

The research mentioned above developed in parallel to methods for estimating hot spots. With about 75% of agencies using the hot spots policing approach, most use kernel density estimations to identify intervention areas Weisburd and Majmundar (2018); Mastrofski and Fridell (2015). While kernel methods show great success, they only focus on singular cells and spaces instead of the surrounding problem areas or opportunities. This lead to the suggestion that there should perhaps be more to just plotting densities of crime areas Eck (1997). To address this, a risk terrain model has been presented Caplan et al (2011), applying forecasting of opportunity structures throughout a geographic space and building on prior kernel density work. This does, however, still include possible pitfalls of an over-reliance on the epicenter of cells to suggest patrol work. Hence, the risk terrain model and kernel density models still apply a static approach for placing police on dots, or epicenters, with little consideration for the full spatial range identified.

In summary, hot spot policing provides a way for police departments to reduce crimes by patrolling problem areas effectively, but there are limitations to the use of kernel methods. Hot spots illustrate varying levels of crime concentration over a geographic landscape. The empirical work described in this paper suggests that the decision of where the line is drawn by analysts to define a hot spot may vary the amount of attention. Thus, hot spot patrols may benefit from a defined route that exhibits optimal routes to target crime prevention resources so they do not focus on just one area.

2.2 Patrol optimization

For strategic planning, law enforcement makes use of hot spots to identify problem areas to patrol.The visible presence of patrols in a community is one of the key components in reducing crime, especially in hot spots. For this reason, the identification of routes in hot spots is relevant due to patrols being constrained by street networks Menton (2008). Furthermore, given the scarcity of police resources, the efficient allocation of proactive patrols is crucial, and an optimal dosage of police presence at these hot spots needs to be applied. Police agencies identify hot spots with spatial ellipses, grid mapping, thematic mapping, kernel density methods, Getis-Ord Gi*, and point processes for spatial analytics Chainey et al (2008a); Ratcliffe (2010); Xue and Brown (2006). That being said, advanced route planning based on proper hot spot estimations still lags behind most current research.

Scholars have recently turn to algorithms. Patrol optimization deals with identifying optimal routes so that officers target hot spots efficiently. While these methods are complex, they offer the potential to dynamically shape patrol routes to service each call, problem area, or assignment. For example, recent work using the ant colony optimization algorithm and Bayesian methods Chen et al (2015); Furtado et al (2009) shows the practical utility of efficiently hitting each hot spot in an optimal manner. In another approach, dynamic modeling is used by assuming that offenders will predict patrol routes Paruchuri et al (2008), demonstrating the model’s ability to determine optimal paths that balance predictable and unpredictable street network paths. Related research suggests the potential to decrease criminal activity and the public’s fear of crime by modeling patrol routes illustrating the shortest Hamiltonian cycle for visiting each location in a city Chevaleyre (2004).

Additional facets of patrol optimization consider limited patrol resources and take on a variety of approaches. These include the application of patrol optimization using a cost-benefit analysis, maximizing the coverage of hot spots and accounting for the paths between streets and places Chawathe (2007), as well as a multi-agent-based algorithm to design efficient patrol strategies Reis et al (2006). The latter simulation models a city’s road network to find optimal routes to minimize crime in a city. Further work studies changing offenders’ opportunity structure Furtado et al (2006), while related efforts simulate changing problem places that adapt to patrol routes Melo et al (2005). Finally, similar applications look at how district models can be optimized to adequately distribute calls for service or incidents in a given jurisdictions Liberatore et al (2020); Mitchell (1972); Bodily (1978); Piyadasun et al (2017). The shortcoming of each of these works is that they focus on cost efficiency and formulation of routes. Only few works include the importance of hitting potential problem places, although these studies do not account for the quality of patrols Reis et al (2006); Melo et al (2005).

Even with progress underway, there are still several limitations that patrol optimization studies fail to consider in their analyses. To our knowledge, none of the existing patrol optimization articles and hot spot research meet quality patrol needs while being efficient. Additionally, shortening the scope of patrol optimization to deal with one problem appears to be a feasible approach to managing proper efficiency in crime prevention at hot spots. The modeling of optimal patrol routes and simulated agents to combat problem places and limited resources is still in the early stages. While valuable for researching the impacts of policing strategies, real-world applications of optimal patrol routes are, thus, severely limited.

3 Data and methods

3.1 Crime incident data

We use the Chicago Data Portalaaahttps://data.cityofchicago.org/, an open-access data service. The portal features a complete dataset of reported crime incidents from 2001 to the present day, covering over 17 years, with the exception of murders where data exist for each victim. The crime incident data are provided by the Citizen Law Enforcement Analysis and Reporting (CLEAR) system of the Chicago Police Department. CLEAR’s choice offered an ideal data set that allowed for mapping, testing, and near-current crime events.

After obtaining the dataset, we extract all entries pertaining to 2018, and retain only three variables of interest; the primary crime type and the coordinates of the reported crime’s location. We plot the coordinates using ArcMap 10 and TIGER street centerlines projected to Geographic Coordinate System North American 1983 as spatial reference data. After this step, we omit all entries for which at least one of the retained variables is not present. This omission for missing data leads to the data for 2018 being reduced from 178,659 to 177,669 entries, resulting in a negligible loss of around 0.5% of data points.

In this work, we focus on Part I offenses as defined by the Uniform Crime Reportsbbbhttps://www.ucrdatatool.gov/ (UCR). Our choice of Part I crimes reflects both the high priority placed on this type of crime and its reliability, as well as its prior use in patrol route optimization studies Barnett-Ryan et al (2014); Chen et al (2015, 2017). In this context, aggravated assault, forcible rape, criminal homicide, and robbery are Part I violent crimes, whereas arson, burglary, larceny-theft, and motor vehicle theft are Part I property crimes. We extract these eight primary types from the preprocessed dataset, which leaves us with 78,894 incidents of Part I crimes in Chicago during the year 2018, an overview of which is provided in Tab. I. In order to keep our algorithm’s runtime low, and given that we are interested in keeping the overall density profile of crime incidents, we use uniformly-random sampling to reduce the dataset to 5,000 data points and show, in Section 4, the sufficiency of the sample in a predictive case.

3.2 Subspace-constrained mean shift

Our approach is an extension of the subspace-constrained mean shift algorithm (SCMS), a density ridge estimation method that has been further extended in application areas described below. Following the examples of other scholars, we further adapt and extend the algorithm for a criminological context. The SCMS algorithm can be applied to crime patterns to extract route templates from high-density areas.

In order to provide readers with the background of the employed method in more depth, a short overview of the mathematical foundations is required. Given a probability density function $p:\mathbb{R}^{d}\rightarrow\mathbb{R}$ of dimensionality $d$ , as well as a corresponding gradient $\nabla p(x)$ and a Hessian $H(x)$ , let $v=\{v_{1},v_{2},\dots,v_{d}\}$ be the eigenvectors of $H(x)$ corresponding to eigenvalues $\lambda=\{\lambda_{1},\lambda_{2},\dots,\lambda_{d}\}$ sorted in descending order. Defining $\Lambda(x)$ as the diagonal matrix with $\lambda$ along the diagonal, and with the eigendecompostion $H(x)=U(x)\Lambda(x)U(x)^{\top}$ , we let $v^{\prime}$ be the columns of $U(x)$ associated with the $d-1$ smallest entries in $\lambda$ . In addition, let $L(x)\propto L(H(x))=v^{\prime}v^{\prime\top}$ be a projection on the linear space of the columns in $v^{\prime}$ , then the projected gradient is defined as $\nabla p(x)=L(x)g(x)$ . For a map $\xi:\mathbb{R}\rightarrow\mathbb{R}^{d}$ , the ridge $R$ can be expressed as $R=\{x:||G(x)||=0,\lambda_{d+1}(x)<0\}$ Ozertem and Erdogmus (2011); Genovese et al (2014). In other words, a density ridge is a local density maximization in the normal direction given by the Hessian. While the above provides a bare-bones definition, we refer the interested reader to Genovese et al (2014) for a more detailed introduction to non-parametric ridge estimation.

Kernel density estimation, which is also known as the Parzen-Rosenblatt window, is a non-parametric statistical method to estimate probability density functions Rosenblatt (1956); Parzen (1962). The most common choice, and the one used in our approach, is the radial basis function (RBF) kernel, also known as the Gaussian kernel, with $\mathcal{K}(x)=(1/\sqrt{2\pi})\exp(-0.5x^{2})$ . The SCMS algorithm Ozertem and Erdogmus (2011) is a KDE-based non-parametric iterative approach to estimate the ridges of a probability density function in the context of self-consistent smooth curves using $\nabla p(x)$ and $H(x)$ . While the literature on applications since its recent introduction is sparse, the algorithm has been applied to neuroscience Bas and Erdogmus (2011) and road networks Miao et al (2014), as well as in astronomy Chen et al (2015b, c, a, 2016, 2017); He et al (2017); Hendel et al (2019); Moews et al (2020). Specifically, the method is extended with thresholding Chen et al (2015b) for the application to cosmic web reconstruction, using a KDE over the dataset to counteract the effect of areas with low probability densities.

The convergence properties of the SCMS algorithm have been analyzed Ghassabeh et al (2013), showing that the method inherits some properties of the previous mean shift algorithm Fukunaga and Hostetler (1975), most importantly its monotonicity and the convergence of density estimates along the output sequence, together with other properties that offer theoretical guarantees for stopping criteria. For an up-to-date contextualization of the approach in the broader field of topological data analysis, we refer the reader to suitable overview Wasserman (2018), as well as to a more general analysis of non-parametric density ridge estimation Qiao and Polonik (2016). In addition, a study of, as well as extensive proofs for, ridge estimation from a geometrical perspective have been conducted Genovese et al (2012).

3.3 Modifications and extensions

In addition to providing a fast pure-Python implementation of the SCMS algorithm, with thresholding implemented in line 6 of Alg. 1, we introduce multiple modifications of the methodology tailored to geospatial data and applications in criminology.

An optimal bandwidth calculation for crime incident data has been introduced earlier Williamson et al (1999), based on the average distance of each coordinate to its nearest $k$ neighbors, averaged over all coordinates in the dataset Eck et al (2005). For the distance $\textarc{d}(\theta_{i},\theta_{j})$ between two coordinates of a dataset $\theta$ , and with the number of nearest neighbors $k$ , the calculation of the optimal bandwidth $\hat{\beta}$ takes the form of the following equation:

[TABLE]

This approach is related to the k-nearest neighbors (k-NN) algorithm, a non-parametric statistical method commonly applied to regression and classification problems Cover and Hart (1967). We make use of this calculation to provide an optimized default bandwidth for our method. Without a bandwidth optimization, the bandwidth would need to be set manually by the user, which would lead to problems in both directions. Either the coverage would be diminished due to a too large bandwidth, resulting in ridges that follow an overly broad density profile, or the ridges would present a too fine-grained net of substructures that would mathematically provide good coverage, but not be practical for patrolling.

While the Euclidean distance is a staple in geometric calculations, its use can lead to distorted measures when applied to geospatial coordinates over sufficiently large distances. In this context, the orthodromic distance is the shortest path between two coordinates on a sphere, measured along the sphere’s surface. As such, it provides a sufficiently realistic way to calculate distances as geodesics on an approximated shape of the Earth. While police patrolling is, in practice, a regional problem and local topology outweighs the curvature of our planet, a negligible difference in computational costs allows the resulting software to be applicable to other, more large-scale challenges in other fields.

The haversine function of an angle $\alpha$ is a numerically better-conditioned for small geodesic distances than using the spherical law of cosines. The haversine formula Inman (1835) makes use of that function and provides a way to calculate the orthodromic distance suitable for our purposes in that it remains accurate on small-scale local distances and stays applicable on larger scales. Denoting the latitudes and longitudes separately, the haversine distance between points $\theta_{1}$ and $\theta_{2}$ is then:

[TABLE]

One interesting point to note is that the applicability of the haversine distance directly translates to projected astronomical observations, although with flipped horizontal axes, as the sky in the latter is viewed as a sphere with the Earth at its center. This is done in an application of our implementation Moews et al (2020), making direct use of our work across fields as a result, and showing the interdisciplinary applicability of methodological and software developments between fields. In addition to its use for the SCMS algorithm’s iterative updates, we also use this distance for the k-NN approach of calculating an optimal bandwidth, replacing $\textarc{d}(\theta_{i},\theta_{j})$ in Eq. 1 with $\textarc{d}_{\mathrm{hav}}(\theta_{i},\theta_{j})$ from Eq. 2.

Well-approximated density ridges require the SCMS algorithm to run over a sufficient number of iterations. Since a trial-and-error approach is not the most time-efficient way of using the algorithm, we implement a convergence check that uses the mean shift update in Alg. 1. Let the update be denoted as $\phi_{n,i}$ , for iteration $n$ and $j\in\{1,2,\dots,|\psi|\}$ for ridge candidate points $\psi$ , then the calculation takes the following form:

[TABLE]

We then introduce the convergence criterion, for a convergence threshold $c$ , as the absolute difference between an iteration’s current update and the last iteration’s update not exceeding the convergence threshold, meaning that $||\phi_{n-1,i}-\phi_{n,i}||\leq c$ . Without this introduced convergence criterion, the number of iterations would, as in the original SCMS algorithm, need to be set manually by the user. This poses the challenge of correctly guessing the number of required iterations to create well-defined ridges, as too small a value would result in fuzzy ‘clouds’ along the ridges, as opposed to ridge lines. The value would thus need to be set rather large in order to avoid this issue, hoping to overshoot the necessary but unknown value, which would, even if successful, increase the computational costs and thus the runtime of the algorithm.

Lastly, practitioners in criminology are often primarily interested in hot spots, focusing their efforts on regions with high probability densities. In order to enable this use of our method, we propose a cut-off functionality to return only ridge estimates in regions with a high number of data points in comparison to the dataset. For a given percentage value $p$ , the KDE in the SCMS algorithm is used to only retain ridge estimate points above the $(100-p)^{\mathrm{th}}$ percentile of the dataset’s estimated probability density function. This means that the ridge estimate points $\psi$ are, for a bandwidth $\beta$ and a Gaussian-kernel KDE, reduced to a subset $\psi^{\prime}$ :

[TABLE]

This approach allows for the exclusive retention of ridge estimates that fall within regions of high probability densities, effectively slicing the density landscape horizontally at the required percentage level and extracting the ridge estimate points that can be found on the remaining landscape. The advantage of this extension is that a top-percentage level of crime density can be freely chosen to concentrate hot spot policing efforts on a highest-density subset of areas in line with additional considerations by the respective practitioners.

We introduce a pure-Python software tool for density ridge estimation describing geospatial evidence (DREDGE), written for Python 3. The tool itself is available on, and can be installed via, the Python Package Index (PyPI).ccchttps://pypi.org/ We also provide the complete code for DREDGE in a public repositorydddhttps://github.com/moews/dredge, accompanied by documentation, a quickstart tutorial, and a use case featuring example code.

4 Results

4.1 Primary experiment and visualization

The theoretical work on hot spots and direct patrols has been widely applied and studied within the field of criminology Braga et al (2014). Current applications of patrol routes include patrols that are mainly planned using street network models and KDE Mamalian et al (1999); Ratcliffe (2004a). Our work seeks to capitalize on that aspect through density ridge estimation. The density ridges obtained through this experiment with Chicago’s 2018 Part I crime incidents are shown in both panels of Fig. 1. In the left panel, we additionally show a sample of 5,000 coordinates of reported crime incidents, the same size as used by the DREDGE run. In the right panel, we overlay the density ridges with a KDE based on the same optimal bandwidth used by our method, demonstrating the center-line compliance of ridges with hot spots identified by traditional approaches. We show how DREDGE results line up with the underlying data as well as KDE outputs to demonstrate how our results follow high-density areas identified with this alternative approach, using the latter as a comparison baseline.

The implementation of our method described in this paper is run with default values, allowing the software to make use of its adaptive behavior. We run this experiment on an Intel Core i7-5600U CPU with 2.60 Ghz, two cores, and four threads, on a machine featuring a sufficient 8 GB of RAM and resulting in a runtime of 6 minutes and 26 seconds. The algorithm was not parallelized, running in a single-threaded fashion to gauge the out-of-the-box performance, although low-level parallelization on a CPU can be easily implemented with the ‘multiprocess’ package.eeepypi.org/project/multiprocess

The practical application and ease of policing hot spots has allowed police departments to patrol specific areas more readily Weisburd and Lum (2005). Capitalizing on hot spots, we make use of DREDGE’s ability to retrieve density ridges from a specified level of high-density areas, as discussed in Section 3.3. Fig. 2 shows the respective top-percentage ridges retrieved through this experiment. Both panels show partial density ridges, making use of the built-in threshold functionality set to 5% for density ridges covering the region above the 95th percentile of the incident density distribution. As in Fig. 1, the left panel additionally shows a sample of 5,000 coordinates of reported crime incidents, the same size as used by the DREDGE run, to show the relation of ridges to the underlying dataset, whereas the right panel overlays the density ridges with a KDE estimate for the data points relevant to the top 5% ridges as a comparison baseline.

Due to the same underlying analysis, the high-density area highlighted through the KDE visualization in Fig. 1 corresponds to the hot spot singled out in Fig. 2. This location falls within the Near North Side and Loop areas of downtown Chicago, known as tourist and shopping destinations. An obvious interpretation of this high-density accumulation of data points relies on the considerable overrepresentation of larceny-theft in our dataset, as these areas provide ample opportunity for such crimes, combined with scaling effects due to the number of people frequenting them, which we confirmed for over a fifth of the larceny-theft reports occurring in these areas.

4.2 Post-hoc analyses and comparisons

To test the predictive accuracy and stability of density ridges over time, we extract another dataset from the Chicago Data Portal. The procedure remains the same as in Section 3.1, but with data for the year 2019 until the end of May. This amounts to 38,205 preprocessed Part I crimes. For this experiment, we use the complete dataset without subsampling, in addition to confidence intervals for multiple runs, to assess the stability of the algorithm’s performance. We employ the ridges extracted from the previous 2018 data to simulate route optimization. We measure the distance to the nearest density ridge and calculate the percentage of incidents falling into this envelope around ridges. In doing so, we quantify the amount of incidents in 2019 that happen near a route template based on 2018 data.

This approach is related to a hit rate, or the percentage of crime incidents occurring within an area of a certain size Chainey et al (2008b). Related work Bowers et al (2004) proposes a search efficiency rate that counts the number of events per square kilometres, although this approach lacks comparability between different study areas. Our choice to measure the percentage of overall crime incidents within envelopes around ridges bears the closest resemblance to the prediction accuracy index (PAI) Bowers et al (2004). The PAI computes the percentage of crime incidents within a predicted area, divided by the percentual size of the predicted areas in relation to the respective study area. Notably, accounting for the predicted area size is not a concern in our ridge-specific approach, which predicts curvilinear filaments instead of areas. Instead, our measurement’s equivalent is the envelope width around ridges, which requires the calculation of the crime incident coverage for varying widths in order to accurately represent the ridges’ success.

We compute this experiment for distances in the interval $[0.1,1]$ in miles, in steps of 0.01 miles, and repeat each experiment for each distance for a total of 10 times to recover confidence intervals. Each of these 10 runs per distance step is based on a random subsample of reports from the year 2018, with a different random seed each time, to validate the efficacy of a comparatively small subsample of 5,000 data points. In addition, we measure the number of iterations required each time to test the suitability of the convergence criterion introduced in Section 3.3. We also measure the same distance envelope coverage for the top 5% of crime report densities as shown in Fig. 2, and with the same ridges as depicted there, for which we identify five clusters via the mean shift algorithm and use 5%-thresholded crime report coordinates. Using random within-cluster connections with interpolation, we create random routes per identified hot spot to compare the predictive coverage of our approach to random patrols within hot spots, as well as with the use of solely the hot spot center as a point of reference.

Fig. 3 shows the results of these experiments. The black line in the left-hand plot depicts the share of Part I crime reports in the City of Chicago from 2019 data on the vertical axis, depending on the size of the distance envelope around ridges on the horizontal axis. The shaded area around the curve indicates 95% confidence intervals for 10 runs per distance, demonstrating the low variation in coverage for comparatively small subsamples that enable fast runtimes. We expect a highly concave curve path to reflect a diminished increase in coverage with higher distances, as ridges should closely follow higher-density areas due to the way they are computed. The curve path in the figure clearly shows this behavior, with ridge envelopes covering 94% of incidents at 0.1 miles for the whole city, quickly rising to 97.5% and 98.5% at 0.2 and 0.3 miles, respectively, and reaching 99% coverage at about 0.6 miles.

In the lower right corner of the left-hand plot, we show a box-and-whiskers plot, with the upper and lower boundaries of the boxing indicating values within 1.5 times the interquartile range, the horizontal line intersecting the box denoting the median,and the off-standing ‘whiskers’ indicating the minimum and maximum values McGill et al (1978). The number of iterations remains stable for different subsamples, demonstrating consistent convergence for subsampled sets in line with the narrow confidence shown in the primary plot. The right-hand plot of Fig. 3 confirms the viability of density ridges as route templates, outperforming random routing within identified hot spot areas. As routes through a hot spot should offer more distance-based coverage by virtue of covering a larger area, one can reasonably expect center-only measurements to underperform both approaches as a sanity check, which the right-hand plot demonstrates.

4.3 Mappability and route waypoints

In order to allow for a translation of ridges to route guidelines, we make use of the R package ‘osmar’ Eugster and Schlesinger (2013), which enables access to OpenStreetMap data. For each ridge point, we calculate the closest registered node on OpenStreetMap to allow us to make use of the underlying maps. From each of these neighbouring nodes, we then identify the nearest node which is located on a highway. These highway nodes can be seen in Fig. 4 for the ridges in the top-5% areas, as well as for a zoom-in of a single ridge segment. Due to OpenStreetMap being compatible with commonly used navigation systems, adding these maps into such systems is a straightforward approach. Patrols can then use these points to guide their routes while remaining flexible regarding their order or share of responsibility of area between the individual police officers.

5 Discussion and limitations

In this exploratory study, we present DREDGE as a way to increase hot spot patrol efficiency and quality. Based on extensions from the field of cosmology, we make use of the SCMS algorithm for hot spot patrol routes. Our experiments show that optimized patrol templates cover about 94% of incidents within 0.1 miles of ridges, reaching to about 99% coverage at 0.6 miles. We implement multiple realizations of our experiments to investigate the stability of crime coverage with ridges based on past data. The corresponding results demonstrate relative stability within narrow confidence intervals across differing subsamples, validating the applicability of our approach for large-scale data.

Research on hot spots maintains that crime concentrates within a small geographic area Weisburd (2015). The widespread assumption when modeling hot spots is that the epicenter of the hot spots is where police attention should be focused. For example, the Pittsburgh Police Bureau used ‘putting cops on dots’ for about 15 years Gorr and Lee (2015). Commonly employed density estimation methods can, however, obscure underlying features Eck et al (2005). The epicenter may be interpreted as a place to heavily patrol in lieu of surrounding areas that may deserve equal or more attention. Thus, this study’s objective is to provide an effective alternative, and implies a refutation of previous assumptions about optimal patrol routes within hot spots to reduce crime through deterrent effects.

Empirical analyses that use KDE techniques or similar statistical modeling approaches often serve one function, namely guiding patrol routes. The issue of the epicenter misleading officers to focus patrols on the central area of a mode is a matter of identifying which places and routes will efficiently deliver deterrence effects. Therefore, in lieu of patrolling one stopping point that is criminogenic and identifiable, these ridges use the space around the problem places that lead to the epicenter, serving a dual function of patrolling the criminogenic locations and targeting high-risk locations across normal hot spots. Prior patrol-routing algorithm and selection work hone in on selecting strict routes that offer little route flexibility. Thus, in-between duties, this application is not meant to be the primary focus of patrols but rather an addition.

Our results show the coverage and potential efficiency of ridge patrolling. The ridges calculated with data from 2018 and 95% confidence intervals in Fig. 3 depict how ridge patrolling, hot spot patrols, and epicenter patrols work. They demonstrate the ridge patrolling method to be the most efficient, exposing nearly 95% of Chicago’s Part 1 crime incidents to directed patrols. If used solely in the epicenter of hot spots by thresholding the data to the highest 5% of crime report densities, ridges still cover the majority of the crime area while providing patrol presence in the latter. Hence, using ridges is more equitable and responsive to the surrounding crime areas than regular epicenter patrolling. The selective or all-inclusive use of this method has the advantage and potential of being a high-efficiency crime prevention program. Furthermore, we argue that it can be a dynamic and widespread crime prevention measure across a city.

Our study is not without limitations. The methodology applied in this paper does not apply weights to problem places. Therefore, future work could focus on the application of spatial weights. In addition, this work assumes that the organization of patrol routes is implementable solely based on filament optimization, not considering community residents who may want to stop officers or demand more presence Leigh et al (2017). Given the fixed location of hot spots, the desires of residents, and the possible need to redraw routes due to calls for service or complex routes, such alterations should follow an as-close-as-possible alignment with ridges.

In addition, the program solely makes use of the geographic locations of prior incidents within a year frame. Thus, we do not account for the temporal dimension of the hot spots or incidents. However, an adaptation of this program by filtering and grouping incidents into time windows prior to running the program is feasible, which allows for more in-depth investigations. For follow-up research, we propose to combine such investigations with the separation of crime types to explore temporal changes in overall distributions, and weight shifts in types of incidents. Future work should, therefore, investigate such ‘hot times’ and DREDGE for police work, offering yet another perspective to research on spatio-temporal crime patterns (see, for example, Ratcliffe, 2004b; Newton and Felson, 1978; Malleson and Andresen, 1978).

Another limitation of this program is the lack of routing times for when officers should patrol each ridge, providing both spatial and temporal guidance. Building on the mention of weight shifts above, ridge segments for percentage-cut areas for different time windows could be weighted for their interest, allowing for time-dependent changes in route templates as well as a duration relying on expected crime density and types. On a more practical note, minimum durations for ridge segment traversal could be calculated through building a graph from points such as the ones covered in Section 4.3. By using either time estimates for average speed or, more advanced, linking the program into the route time prediction that navigation programs offer, traversal durations could be estimated for given segments.

While our implementation performs successful density ridge estimation in a matter of minutes, this requires subsampling from larger datasets of coordinates. As a rule of thumb, we recommend to use a minimum of 1,000 and a maximum of 10,000 data points to ensure representativeness and sufficiently fast runtimes. This is due to the algorithm’s complexity being $\mathcal{O}(d\cdot|\theta|^{2})$ , meaning that it scales linearly with the number of dimensions, which is fixed to $|\theta|=2$ in our case of latitude-longitude coordinates, but features a polynomial runtime due to the number of data points fed into the algorithm Ozertem and Erdogmus (2011). While sample sizes are, in practice, influenced by both time constraints and the size of available datasets, details on effective sample sizes in geospatial dataset resampling can be found in the literature Griffith (2005); Li et al (2016).

Bias in data is a general problem spanning many fields, which extends to geospatial coordinates. One prominent example is the phenomenon of over-policing and under-policing based on previous records, different socioeconomic status, and additional factors such as personal characteristics Black (1980). Since our analysis is based on reported crime incidents, one important limitation of our work relates to previous research on disparities in crime reporting. This multi-faceted issue spans contextual factors in victim and offender characteristics influencing reporting Xie and Lauritsen (2012), as well as localized reluctance of reporting crime incidents Slocum et al (2010). For this reason, practical implementations based on such data should always strive to take the risk of biases present in these datasets into account.

6 Conclusion

Optimizing police patrols, both city-wide and hot spots, remains an interest for researchers and practitioners alike. For this purpose, we show how recent advances in statistics and astronomy can be used to detect principal curves, or density ridges, in crime incident distributions to extract high-density paths. Current work focuses on the hot spot’s epicenter, which decreases equitable patrol to surrounding areas and efficient hot spot patrols to the surrounding area. Overall, we provide a way to amend these issues through a density-based patrol optimization program.

Our study uses 2018 Part I crimes from the Chicago Police Department to demonstrate the patrol templates. Comparing this output to Part I crimes from early 2019, we observe that the majority of crime reports fall into narrow envelopes around identified structures. Thus, this program allocates resources around a hot spot, covering density regions. We argue that this allocates resources equitably and optimally to prevent crimes and reduce hot spots.The combination of hot spot mapping and DREDGE has the potential of being a high-efficiency crime prevention program providing a more responsive and optimal allocation of police resources than solely targeting epicenters of hot spots. We showcase the viability of our approach with intuitive visualizations, allowing for their combination with knowledge about city-specific traffic routes to plan effective patrols while remaining not overly constrained.

Acknowledgments

We thank the City of Chicago and the Chicago Police Department for their open access data. We wish to express our gratitude to Nicholas Corsaro, Cory Haberman, and Monsuru Adepeju for helpful suggestions and comments. We also want to thank the two reviewers for their helpful comments in improving this paper.

Bibliography82

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Al Boni and Gerber (2016) Al Boni M, Gerber MS (2016) Automatic optimization of localized kernel density estimation for hotspot policing. In: 15th IEEE International Conference on Machine Learning and Applications, pp 32–38, DOI 10.1109/ICMLA.2016.0015
2Barnett-Ryan et al (2014) Barnett-Ryan C, Langton L, Planty M (2014) The nation’s two crime measures. Tech. rep., Bureau of Justice Statistics & Federal Bureau of Investigation, program report, U.S. Department of Justice, NCJ 246832
3Bas and Erdogmus (2011) Bas E, Erdogmus D (2011) Principal curves as skeletons of tubular objects. Neuroinformatics 9(2):181–191, DOI 10.1007/s 12021-011-9105-2
4Black (1980) Black D (1980) The manners and customs of the police. New York: Academic Press
5Bodily (1978) Bodily SE (1978) Police sector design incorporating preferences of interest groups for equality and efficiency. J Manag Sci 24(12):1301–1313, DOI 10.1287/mnsc.24.12.1301
6Bowers et al (2004) Bowers KJ, Johnson SD, Pease K (2004) Prospective hot-spotting: The future of crime mapping? Br J Criminol 44(5):641–658, DOI 10.1093/bjc/azh 036
7Braga et al (2012) Braga A, Papachristos A, Hureau D (2012) Hot spots policing effects on crime. Campbell Syst Rev 8(8):1–96, DOI 10.4073/csr.2012.8
8Braga (2007) Braga AA (2007) Policing crime hot spots. In: Preventing Crime, New York: Springer Publishing, pp 179–192, DOI 10.1007/1-4020-4244-2˙12