Density Estimation on a Network
Yang Liu, David Ruppert

TL;DR
This paper introduces a new nonparametric density estimation method on networks that adaptively models potential discontinuities at vertices, improving bias correction and scalability over existing methods.
Contribution
It proposes a data-driven, two-step local polynomial regression approach that handles density discontinuities at network vertices, with theoretical bias-variance analysis and practical extensions.
Findings
Reduces bias near vertices with discontinuities
Scales sub-linearly with sample size
Effective in simulation and real dendrite network data
Abstract
This paper develops a novel approach to density estimation on a network. We formulate nonparametric density estimation on a network as a nonparametric regression problem by binning. Nonparametric regression using local polynomial kernel-weighted least squares have been studied rigorously, and its asymptotic properties make it superior to kernel estimators such as the Nadaraya-Watson estimator. When applied to a network, the best estimator near a vertex depends on the amount of smoothness at the vertex. Often, there are no compelling reasons to assume that a density will be continuous or discontinuous at a vertex, hence a data driven approach is proposed. To estimate the density in a neighborhood of a vertex, we propose a two-step procedure. The first step of this pretest estimator fits a separate local polynomial regression on each edge using data only on that edge, and then tests for…
| Case I | Case II | Case III | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Bias | SD | MSE | Bias | SD | MSE | Bias | SD | MSE | |
| LPLR | 0.0621 | 0.0593 | 0.0074 | 0.0629 | 0.0423 | 0.0057 | 0.0971 | 0.0123 | 0.0096 |
| ESDK | 0.4481 | 0.0196 | 0.2011 | 0.2711 | 0.0229 | 0.0741 | 0.1062 | 0.0121 | 0.0114 |
| ESCK | 0.4956 | 0.0184 | 0.2459 | 0.2312 | 0.0301 | 0.0542 | 0.1324 | 0.0100 | 0.0176 |
| DE | 0.4670 | 0.0172 | 0.2180 | 0.2438 | 0.0173 | 0.0597 | 0.1140 | 0.0116 | 0.0131 |
| Type II error rate | Bias | SD | MSE | |
|---|---|---|---|---|
| (3.5, 4.5) | 0.0056 | 0.0796 | 0.1480 | 0.0209 |
| (3.55, 4.45) | 0.0227 | 0.0786 | 0.1482 | 0.0199 |
| (3.6, 4.4) | 0.0540 | 0.0811 | 0.1458 | 0.0215 |
| (3.65, 4.35) | 0.1283 | 0.0833 | 0.1453 | 0.0207 |
| (3.7, 4.3) | 0.2453 | 0.0861 | 0.1487 | 0.0210 |
| (3.75, 4.25) | 0.3906 | 0.0925 | 0.1454 | 0.0203 |
| (3.8, 4.2) | 0.6030 | 0.1017 | 0.1398 | 0.0183 |
| (3.85, 4.15) | 0.7633 | 0.1040 | 0.1245 | 0.0158 |
| (3.9, 4.1) | 0.8863 | 0.0956 | 0.1135 | 0.0136 |
| (3.95, 4.05) | 0.9460 | 0.0781 | 0.1038 | 0.0127 |
| (4, 4) | NA | 0.0631 | 0.1013 | 0.0140 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Gene expression and cancer classification · Statistical Methods and Bayesian Inference
Density Estimation on a Network
Yang Liu
Department of Statistics and Data Science, Cornell University
and
David Ruppert
Department of Statistics and Data Science and
School of Operations Research and Information Engineering, Cornell University
The authors gratefully acknowledge
Abstract
This paper develops a novel approach to density estimation on a network. We formulate nonparametric density estimation on a network as a nonparametric regression problem by binning. Nonparametric regression using local polynomial kernel-weighted least squares have been studied rigorously [11], and its asymptotic properties make it superior to kernel estimators such as the Nadaraya-Watson estimator. When applied to a network, the best estimator near a vertex depends on the amount of smoothness at the vertex. Often, there are no compelling reasons to assume that a density will be continuous or discontinuous at a vertex, hence a data driven approach is proposed. To estimate the density in a neighborhood of a vertex, we propose a two-step procedure. The first step of this pretest estimator fits a separate local polynomial regression on each edge using data only on that edge, and then tests for equality of the estimates at the vertex. If the null hypothesis is not rejected, then the second step re-estimates the regression function in a small neighborhood of the vertex, subject to a joint equality constraint. Since the derivative of the density may be discontinuous at the vertex, we propose a piecewise polynomial local regression estimate to model the change in slope. We study in detail the special case of local piecewise linear regression and derive the leading bias and variance terms using weighted least squares theory. We show that the proposed approach will remove the bias near a vertex that has been noted for existing methods, which typically do not allow for discontinuity at vertices. For a fixed network, the proposed method scales sub-linearly with sample size and it can be extended to regression and varying coefficient models on a network. We demonstrate the workings of the proposed model by simulation studies and apply it to a dendrite network data set.
Keywords: Asymptotic bias and variance, discontinuous density, kernel density estimation, local piecewise linear estimation, pretest estimation
1 Introduction
In this paper, we analyze spatial patterns of points that lie on a network. There are many types of events that can occur along networks. The lines or curves that form the network may be roads, rivers, subway lines, spider webs, blood vessels, or nerve fibers. The events may be vehicles, street crimes, car accidents, retail stores, or neuroanatomical features such as dendritic spines. For a recent summary of applications that involve the modelling of point patterns on a network of lines, see [2]. In order to carry out analyses of those events, researchers need a range of specific techniques.
One of the frequently demanded tasks is density estimation on a network, but few statistical methods had been developed to address this need until recently. A natural first attempt at analyzing such data is to take the kernel density estimate (KDE) on the one-dimensional real line
[TABLE]
and apply it directly to network data by defining as the network distance, where is any location on the network, and are the observed data locations. , where is a kernel function, and is the bandwidth. The network distance is defined as the shortest path distance between two points on a network. However, under this approach, estimate (1) does not conserve mass. This happens because that the induced kernel is not a probability density on the network at points within the -neighborhood of a vertex, and so (1) is not a probability density. As a result, it will overestimate the true density. [8] summarized widely used methods for density estimation on a network. Many published papers, such as [13], mentioned by Okabe et al. computed a kernel density estimate on a network, but most have used (1).
Although there is currently no consensus view on how to perform density estimation on a network, the two most popular heuristic techniques, namely the equal-split discontinuous kernel (ESDK) estimator and the equal-split continuous kernel (ESCK) estimator ([9], [8], [12]) seem to produce reasonable results in applications. However their methods suffer from a lack of theoretical grounding, and their computational cost is high (see Section 7.2). Furthermore, ESCK does not allow for discontinuity at vertices, which is found in many applications, such as traffic network, and the example in Figure 1 and Section 7.4. Although ESDK produces discontinuous density at vertices, the estimation in the small neighborhood of vertices always uses data from all edges, which can lead to large bias. This is demonstrated in Case I of the simulation study in Section 7.1.
More recently [7] proposed a diffusion density estimator (DE) on networks using the connection between kernel smoothing and diffusion [3]. DE is based on numerical solution of the heat equation, and it is not only faster than ESDK and ESCK, but also provides a sound statistical rationale and helps establish theoretical properties. The estimate is asymptotically unbiased with rate away from the boundary region, where is the bandwidth. At a terminal vertex, it has the standard boundary rate of KDE. Moreover, similar to ESCK, DE does not allow for discontinuity at vertices, which consequently, can lead to large bias.
In this paper, we take a different approach. We propose local piecewise linear regression (LPLR) on a network, and establish theoretical properties for the proposed method. The piecewise linear structure is needed to model the discontinuity in the regression function or the first derivative of the regression function at a vertex. By binning, we form a histogram of observations (bin centers and counts) on the network, which is then smoothed by local polynomial regression (LPR). The advantage of local polynomial estimation is reduction of the order of the bias, especially when the evaluation point is near or on the boundary vertex, and, as a practical matter, the amount of computation is reduced. To accommodate densities with discontinuous derivatives at vertices, we introduce piecewise local regression. As far as we know, local polynomial regression on networks has not be proposed and investigated and piecewise local linear regression is also new. We note that the proposed method assumes that the data comes from a Poisson process on a network, but McSwiggan et al is applicable to more general point processes. However we want to point out that our approach has better order of the bias near vertices, and it only imposes continuity at vertices when there is evidence suggested by the data that the density is continuous there. Our approach is not limited to density estimation as it can be extended to handle other types of analysis of network data, such as regression analysis. The implementation of the proposed method is fast, the computation time scales only with data sub-linearly.
To motivate our research and illustrate its contribution, we introduce the dendrite data collected by the Kosik Lab, UC Santa Barbara, and first analyzed by [1] and [6]. In this example, the events are dendritic splines, which are of clinical importance. Cognitive disorders such as ADHD and autism may result from abnormalities in dendritic spines, especially the number of spines and their maturity [10]. The events on the network are the locations of 566 spines observed on one branch of the dendritic tree of a rat neuron, as shown in Figure 1. The density of the spines shows clear discontinuity at vertices A, C and D; see Figures 7, 9, 11 and 13. We will need a density estimation method that allows for multiple levels of smoothness at vertices – discontinuous, continuous with discontinuous derivative, and continuous with continuous derivative. We show that our proposed method will remove the bias that has been noted for existing methods such as [8] and [7] due to their inflexibility at vertices. KDE, ESDK, ESCK and DE’s inflexibility arises because the smoothness of the density function at a vertex needs to be decided before choosing an estimator: if discontinuity is desired at a vertex, KDE is applied to each edge, otherwise ESDK, ESCK or DE should be applied. In contrast, we propose a data-driven approach and determine the smoothness of the density function at a vertex by statistical testing. In fact, estimation at vertices is a major contribution of this paper.
The paper is organized as follows. Section 2 gives basic definitions. In Section 3, a binned local polynomial regression estimator is constructed individually on each edge, and asymptotic properties of the estimator are established. As the evaluation point approaches the vertex from each edge, we obtain limits of the regression functions. In Section 4, we construct an asymptotic test for the equality of those limits. If the null hypothesis is not rejected, then the regression function is re-estimated for evaluation points that are within the -neighborhood of a vertex by binned local piecewise polynomial regression using data from all neighboring edges. These estimators are constructed and their asymptotic properties are studied in Section 5. In Section 6, we discuss networks with loops and related computational issues when the bandwidth is large. In Section 7, we demonstrate the workings of the proposed method with simulated data and a real example, and discuss and analyze practical issues. We end with a discussion in Section 8.
2 Preliminaries
In this section, we introduce basic definitions and formulate density estimation as a local linear regression problem.
2.1 Network and density function on a network
Let be vertices and define an edge connecting and as a curve where and . We call , a network in . A path between two points is a sequence , where are vertices, such that are edges, and , . The path length is Length, where integration is with respect to arc length. In the case where there are multiple paths between two points, we will use the shortest path to define length. If there no paths from to , we define the network distance between and to be infinity. Often, the network is embedded in a two-dimensional space, such as a street network (if there are no overpasses). However all of our results apply easily to a network of curves embedded in a higher dimensional space, such as the dendrites which are embedded in . In the theory section (Section 3, 4 and 5), we will study in deal a simple network, where there is one vertex connecting edges. We discuss application to complex networks in Section 6 and 7.
Our objective is to construct an estimate of an unobservable underlying probability density function of event locations based on observed data . The density satisfies for all , and . Stated differently, let be a realization of a Poisson process on the network. We want to estimate the intensity function of . The intensity function is the expected number of random points per unit length of network in the small neighborhood of . The estimation of and of are equivalent because, for fixed , for all .
2.2 Problem formulation
In this paper, we consider density estimation via local polynomial regression. This is achieved by way of binning. Here we use the “simple binning” discussed in [5]. For , are the bin centers, are bin counts, are bin heights, and is bin width. Letting be the total number of observations on the linear network, we have . These rectangles form a histogram with total area , since the area of the th bin is and . For a chosen , we consider the regression model . We have
[TABLE]
where is the expected proportion of points in bin , and is the regression function. We note that this assumption would still hold for more general point processes. For the variance, is the sample proportion so
[TABLE]
The variance expression depends on the Poisson process assumption. The approximate equality symbol is due to binning induces a bias, but the bias can be made negligible by a small bin width. Hall and Wand studied the accuracy of binned kernel density estimators [5], and their results show that, with the commonly-used Epanechnikov kernel, we require to go to zero faster than the bandwidth, if binning is not to have a significant effect on the bias of the estimator. Therefore, the error term has approximately mean zero and variance .
Assuming that is sufficiently small, we can smooth the histogram by local polynomial regression using , , as data. We propose a “pretest” estimation procedure that consists of the following steps:
Local polynomial regression on each edge individually. 2. 2.
Test joint equality of intercepts at the vertex. 3. 3.
If joint equality is not rejected, then the regression function is re-estimated by local piecewise polynomial regression using data from all neighboring edges for evaluation points that are within the -neighborhood of a vertex.
An asymptotic test for joint equality of slopes at the vertex can be constructed similarly.
3 Local polynomial regression
First, we consider each edge individually. This is equivalent to a fixed equally-spaced design model, where the -variables are the bin centers , and the -variables are the bin heights . We want to estimate the regression function . Using a Taylor expansion, we can approximate , where is close to a point , by a degree polynomial:
[TABLE]
provided that all the required derivatives exist. The local polynomial regression estimator minimizes with respect to the function
[TABLE]
where . Let denote the minimizer of (2). Then estimates and estimates , the th derivative of , for . Conveniently, (2) is a standard weighted least-squares regression problem. Let \boldsymbol{W}={\rm diag}\big{\{}K_{h}(x_{1}-x),\dots,K_{h}(x_{n}-x)\big{\}}. Define
[TABLE]
Assuming invertibility of , then . The estimate of the regression function at is , where is the vector with being the first entry and zero elsewhere. [11] studied the leading conditional bias and variance of the above estimator, but here we consider in detail only the local linear binned estimator. We make the following assumptions:
- A1.
The function is continuous. 2. A2.
The kernel is symmetric and supported on . Also, has a bounded first derivative. 3. A3.
As and , and , where is sample size, is bandwidth and is bin width.
It follows from the definition of the estimator that , where the vector contains the true regression function values at the each of the ’s. For local linear regression we have that
[TABLE]
Also, , where is a diagonal matrix with diagonal entries , . Note that if is a linear function then the local linear estimator is exactly unbiased. To find the leading bias term for general function , we review the result of local linear regression [4] with the binning procedure discussed in [5]. This estimator uses only data on a single edge.
Theorem 1**.**
Suppose that is a point on the line segment of interest, and that A1, A2 and A3 hold. Let , then
[TABLE]
where , ,
[TABLE]
The estimator has the following asymptotic distribution:
[TABLE]
For an interior point , we can simplify (4) and (5). Note that in this case for odd , we have
[TABLE]
The local linear estimator has better properties at the boundary than the kernel density estimator (KDE). Asymptotically, the local linear estimator’s bias is of the same order, , at a boundary point as at an interior point, where the bias would be of order for KDE. Even if is at the boundary of the density’s support, since the local linear estimator fits a weighted least squares line through data near the boundary, if the true relationship is linear, this estimator will be exactly unbiased.
Extension to higher order is straightforward, see [11]. For odd, the bias is of order everywhere. When is even, the bias is of order away from the boundary but in the boundary region. By increasing the polynomial order from even to the next odd number, the order of the bias in the interior remains unchanged, but the bias simplifies. Kernel estimation corresponds to . On the other hand, by increasing the polynomial order from odd to the next even number, the bias order increases in the interior. This effect is analogous to the bias reduction achieved by higher-order kernels.
4 Test for joint equality at vertex
When multiple edges meet at a vertex, there are as many estimates for . Consider a simple network consisting of three edges and and a vertex (see Figure 2). One estimate is , the local linear regression estimate of using only data on . The estimates and are constructed similarly. In this way, for a vertex connecting edges, we obtain estimates at that vertex. In this section, we construct a test to study whether the for (or any subset) are equal.
Let be a vertex and let , , be the edges connected by . We want to test
[TABLE]
Theorem 2**.**
Let , and be a contrast matrix such that . Under the null hypothesis, we have
[TABLE]
where , and is the asymptotic variance of the th estimator. Here denotes “asymptotically distributed as.”
We note that the test statistic (12) is invariant under the choice of contrast matrices. This is easy to see by noticing that the rows of are linearly independent.
5 Estimation with equal intercepts at the vertex
If is not rejected, then evaluation points within the -neighborhood of the vertex can be estimated using data from all neighboring edges, subject to . The resulting estimator has a lower variance compared to estimating separately on each edge.
For now, we make the following assumptions:
- B1.
No loop shorter than in the network. 2. B2.
All edges are longer than .
If the neighborhood of the evaluation point is completely covered by an edge, then the estimator is the same as if we only use data from that edge, and its asymptotic bias and variance are given in Theorem 1. Next, we derive the estimator when the neighborhood contains a vertex. Note that, by assumption B1, there is a unique path between the evaluation point and any data point that is within its -neighborhood, and by assumption B2, we need only consider the case where the neighborhood of contains exactly one vertex. For a fixed network, the above assumptions eventually will hold as and . We will consider the effect of these assumptions on the implementation of the proposed estimator in Section 6.
We note that our estimator will have an additional bias when the test in Section 4 makes a type II error, that is, when the density is not continuous at the vertex but the test accepts that it is continuous. We will investigate this problem in Section 7.3.
5.1 Deriving the estimator
Let us first consider the simple network in Figure 2, and consider a point in the -neighborhood of the vertex , and data points on , on and on . Directional distance is used here. The evaluation point is the origin when measuring distance. Note that we will use directional distance in our proposed model. When we consider a path from a point on the network to another, we use the evaluation point as the origin. For example the distance between and is and the distance between and is , which are both negative.
Using a Taylor expansion up to first order, we can approximate the regression function on at as provided that all the required derivatives exist. For the regression function on at , we have
[TABLE]
where (13) holds because that the regression functions from all edges are assumed to be equal at the vertex, (14) is a Taylor expansion of at on , and is defined as . Similarly we have
[TABLE]
For a simple network with one vertex of degree and an evaluation point on , we have for , and , , and for , .
Under the above construction, the regression functions agree at the vertex but can have different slopes there. Letting , , and for , these parameters are estimated by minimizing:
[TABLE]
where , and is the network distance, i.e., the arc length, between and . In matrix notation, we have
[TABLE]
where and is a matrix. The first column of is identically . In the th column, for and , the to entries are , where is the number of bin centers on edge , and the remaining entries are zero. In the th column of , the to entries are , and the remaining entries are . We also note that , and .
The solution to the minimization problem (15) is , and the estimate for the regression function at under the constraint is given by
[TABLE]
Note that we can extend the above derivation to quadratic and higher order approximation, allowing different for all under the constraint that are equal for .
5.2 Asymptotic properties – the local linear case ()
In this subsection, we make the following assumptions:
- C1.
The functions are equal for all . 2. C2.
The function for all are continuous on , excluding the vertex. 3. C3.
The kernel is symmetric and supported on . Also, has a bounded first derivative. 4. C4.
The vertex is always in the -neighborhood of as approaches 0, so must approach at least as fast as approaches 0. 5. C5.
As and , and , where is sample size, is bandwidth, and is bin width.
Note that if C4 does not hold, then we are in the case already studied by [11].
Theorem 3**.**
Suppose that is within the -neighborhood of a vertex that connects segments, and that C1, C2, C3, C4 and C5 hold. Let , where the matrices are defined as above, then
[TABLE]
where the matrices are defined as follows:
. 2. 2.
* is symmetric, and its first row is and its diagonal is , where . All other entries are zero.* 3. 3.
* is symmetric, and its first row is . Its *th row is and its diagonal is . All other entries of are zero. 4. 4.
* is a column vector, its first entry is , and the rest are , for .* 5. 5.
* is a column vector, its first entry is , and the rest, except the **th entry, are , for , and the *th entry is , for . 6. 6.
* is symmetric, its first row is , and its diagonal entries are . All the other entries of are zero.* 7. 7.
* is symmetric, its first row is , its *th row is
* , and its diagonal is . All other entries of are zero.*
Note that for to be in the -neighborhood of the evaluation point , we need , and for to stay in the -neighborhood of the evaluation point asymptotically, we need to approach at least as fast as approaches [math]. So is small and approaches zero as .
5.3 Estimation with equal first derivatives at the vertex
We have studied the asymptotic properties of local piecewise linear regression estimator at evaluation points that are within the -neighborhood of a vertex, under the assumption that , , agree at the vertex. Similarly, we can construct an asymptotic test for the joint equality of the limits of , , as along multiple edges. If we accept the null hypothesis that the first derivatives are equal at the vertex, this constraint should be added to the estimation procedure. For example, in Figure 2, consider an evaluation point and a data point . Using local linear approximations, we have
[TABLE]
The regression function at has a similar expansion. In the last line we used , and expand around the evaluation point on . However to estimate first derivatives, one should use at least local quadratic polynomials.
6 Practical Issues
6.1 Estimation with large bandwidth
In this section, we will describe a more complex situation. When there are loops in the network, we use the shortest path between the evaluation point and a data point, and the distance between the points is defined by the shortest path distance. When there are multiple vertices in the -neighborhood of an evaluation point, and vertices have different results from the joint equality tests, we will show that only data points that have direct access to the evaluation point contribute to the estimation. If is the evaluation point, is a data point that is within the -neighborhood of , and are the vertices on the shortest path between and , we say that has direct access to if the H0 (Theorem 2) is accepted at all , . We will demonstrate this via the following example.
Here is the evaluation point, and , and are data points on edges , and respectively. We consider two scenarios:
is accepted at and , 2. 2.
is accepted at and rejected at .
In the first scenario, by Taylor expansions, we have
[TABLE]
Now, since is accepted at and , we have and , and (19) and (20) become
[TABLE]
The problem of minimizing can be set up in the same way as before. However, in the second scenario, since we do not have , the problem becomes minimization of . We see that the estimation of only depends on the first summation. In other words, only data points that have direct access to the evaluation point contribute to the estimation of .
6.2 Bin width
For fixed sample size, we will let the bin width . For simplicity, we show how this affects the calculation of . In the case where the regression function is estimated on each edge individually, when , to calculate for near the vertex, we use that the Riemann sum converges to the integral as :
[TABLE]
where , , and is the th truncated moment. Note that when the evaluation point is the vertex itself, . The calculation of the term requires
[TABLE]
Here is bar height and is bar width, so is bar area, which equals to , where is bar count and is sample size. Let be the location of th observation. As , eventually each bin is either empty or has exactly one observation in it. Therefore bin count is [math] for most bins, and for bins that contain one observation, and the location of that observation becomes the bin center in the limit.
When the joint equality constraint is added, to approximate for near the vertex, we also need to calculate, for , ,
[TABLE]
Binning with fixed might still be used with very large sample size.
7 Implementation
7.1 Simulation studies
Let us consider the simple network shown in Figure 2: three edges meet at a vertex. We propose three interesting cases on this network using the beta distribution. The three cases that we report here are
We simulate points on each edge, with the vertex being the origin, from , , and respectively. 2. 2.
We simulate points on each edge, with the vertex being the origin, from . 3. 3.
For each edge, with the vertex being the origin, we simulate points from the truncated (from to ) . Then the points are shifted to the vertex by and finally multiplied by , so the support of the true density on each edge is from the vertex to a point that is unit distance away from the vertex.
The true density function over the network is normalized so it integrates to . Our simulation study focuses on demonstrating that the proposed estimator can accommodate various behaviors of the true regression function (the true density function), especially at the vertex, namely, discontinuous (Case I), continuous with discontinuous first derivative (Case II), and continuous with continuous first derivative (Case III).
Simulation results for one dataset are shown in Figure 4. The result for local piecewise linear density is presented in the last row. The network is in red, and the true density functions over the network are the black dashed lines. The blue lines are our estimates. The first column is case I, where the joint equality test for the regression functions at the vertex is rejected, and our estimate is equivalent to local linear regression on each edge. The second column is case II. We first fitted a local linear regression model on each edge, and tested joint equality at the vertex. Since we fail to reject the null hypothesis, we re-estimated the regression function in the -neighborhood of using data from all neighboring edges subject to the equality constraint. Turning to case III in the third column, the estimate is also subject to the joint equality constraint.
In case III, in addition to joint equality of the regression functions at the vertex, joint equality of their first derivatives is also established. For better viewing and comparison between different amounts of smoothness at the vertex, Figure 5 zooms in at the vertex. Estimation in left panel of Figure 5 is subject to only one constraint (joint equality of the regression functions at the vertex). However, after testing, by adding the constraint that the first derivatives are jointly equal, we see in the right panel of Figure 5 that the estimated curve is smoother over the vertex. This is desired because the true density function is continuous and has continuous first derivative over the vertex.
We should note that any sufficiently small bin width will do, but selection of the bandwidth takes more care. We note that cross-validation could be used for bandwidth selection, and more computationally efficient bandwidth selector for polynomial regression on networks is under investigation.
7.2 Comparison with existing methods
Let us consider again the three cases described in Section 7.1 and apply the equal-split discontinuous kernel estimator (ESDK) [9], the equal-split continuous kernel estimator (ESCK) [8], and the diffusion estimator (DE) [7] to each case.
ESDK [9] modifies (1), so it conserves mass. The algorithm makes a copy of the kernel function for each evaluation point, confined to the line edge containing that point. At each fork, the remaining tail of the kernel is split equally between the outgoing edges. Suppose there are outgoing edges, each outgoing edge receives a copy of the kernel tail weighted by . ESCK [8] uses another modified version of (1). At each fork, with outgoing edges, each outgoing edge receives a copy of the kernel weighted by , while the incoming segment receives a copy with the negative weight . DE [7] uses the estimator , where is the heat kernel on a network [3]. Intuitively, is the probability that a Brownian motion on the network, started at location at time [math], will fall in the infinitesimal interval of length around the point at time . The bandwidth parameters for ESDK, ESCK, and DE are selected by cross validation.
Figure 4 illustrates the performances of ESDK, ESCK and DE, compared to LPLR. Columns one, two and three correspond to cases I, II and III, respectively. For case I (discontinuous density), both ESCK and DE produce a continuous estimate at the vertex. ESDK, however, produces a discontinuous estimate, but the bias is still significant. Although ESDK is not a standard kernel density estimator, one should note that the asymptotic bias of kernel density estimation has order of on the boundary, whereas local linear estimation has order of . The significant bias at the vertex can be also due to the fact that ESDK uses data from all neighboring edges as the evaluation point approaches the vertex, so it overestimates low density edges and underestimate high density edges.
For case II (continuous density with discontinuous first derivative), ESDK produces a discontinuous estimate. Although ESCK and DE produce continuous estimates, their bias are considerably higher compared to local piecewise polynomial estimation. This is due to lower order of the asymptotic bias of the kernel estimator near a vertex.
For case III (the density and its first derivative are both continuous) the vertex behaves like an interior point. While ESDK still produces a discontinuous estimate, ESCK and DE are comparable to local piecewise linear estimation, because the asymptotic bias of kernel density estimation and local linear estimation are both of order at an interior point. However, ESCK and DE do not estimate the derivatives of the estimated curves as the evaluation point approaching a vertex from multiple directions. Hence, these methods are inadequate when the derivatives are of interest.
Finally we report the bias, standard deviation, and mean squared error of the proposed local (piecewise) polynomial regression compared to all existing methods. In each case (I, II and III), the result is based on simulation of data sets, and each data set consists of data points on each edge. Bias, standard deviation and mean squared error are reported at the vertex as it is approached from (see Figure 2). The simulation result is summarised in Table 1. We see that ESDK, ESCK and DE only produce comparable results when the vertex behaves like an interior point, whereas LPLR is superior in all other cases. Simulations are run on a Macbook Pro Mid 2015 with 2.5 GHz Intel Core i7. Average time (Case I, II and III) for LPLR, ESDK, ESCK and DE are 6.0082 secs, 4.3217 hrs, 6.4872 hrs and 6.6266 secs for all datasets. ESDK, ESCK and DE are implemented in the R package spatstat. We note that the proposed local linear estimator requires at least moderately large data sets in order to produce good models. But for a fixed network and a fixed number of evaluation points, since the optimal bandwidth is of order , the computation time only scales sub-linearly with sample size. More generally, the computation time is per evaluation point. If the number of evaluation points is , then the computation time grows like .
7.3 Additional bias
The proposed method is subject to additional error because of the limitations of the test described in Section 4. For a large network, we would have to consider a large set of statistical tests simultaneously, and multiple testing problem would likely occur. It means that among the vertices at which the true density is continuous, the test rejected continuity at least one of them. However the main focus of this paper is the estimation of density function at individual locations, and the bias introduced by mistaking a continuous vertex as a discontinuous vertex is just the boundary bias of a local polynomial regression estimator. If one wants to count the number of vertices at which the density function is discontinuous, numerous correction techniques have been proposed in the literature, such as the Holm-Bonferroni method [holm1979simple]. Hence we will focus on type II error.
Our estimator will also have an additional bias when the test that the density is continuous at a vertex makes a type II error, that is, when the density is not continuous but the test accepts that it is continuous. The size of additional bias introduced by a type II error is less clear. We investigate this problem through simulations. For simplicity, let the origin be a vertex, and let the two unit-length edges meet at the vertex. On the left edge, we first sample 1000 points from Beta, then multiply the sample by , so the data points on the left edge range from to [math]. On the right edge, we draw 1000 point from Beta. Pairs of are given in the first column of Table 2. Type II error rate is the probability that the test (Theorem 2) accepts the null hypothesis that the true density is continuous at a vertex when it is not. We perform simulations of data points per edge for each pair of . We report the bias, standard deviation and mean squared error at the vertex approached from the right edge. The result is summarized in Table 2. We see that when type II error rates are large, LPLR under the continuity constraint produces little additional bias due to small gaps between and . Similarly, when and are far apart, there is also little additional bias due to small type II error rates. On the other hand, when type II error rate is in the mid range, for example from to , there is considerable amount of additional bias at the vertex. Hence in the bias column, we should see bigger (in magnitude) biases in the middle rows, and as we approach the top and bottom rows, biases become smaller (in magnitude). Finally we note that the top row has bigger bias than the bottom row, despite that the type II error rate is practically zero. This is due to that local linear regression generally has bigger bias for functions with greater curvature (Beta compared to Beta).
7.4 Application to real data
In this section we apply the proposed local piecewise polynomial regression to dendrite data. The data was collected by the Kosik Lab, UC Santa Barbara, and first analyzed by [1] and [6]. Dendrites are branching filaments which extend from the main body of a neuron (nerve cell) to propagate electrochemical signals. Spines are small protrusions on the dendrites. The network shown in Figure 1 is one of the ten dendritic trees of this neuron. A dendritic tree consists of all dendrites issuing from a single root branching off the cell body; each neuron typically has 4 to 10 dendritic trees. This example was chosen because it is large enough to demonstrate our techniques clearly, without being too large for graphical purposes. The events on the network are the locations of 566 spines observed on one branch of the dendritic tree of a rat neuron. We will show the result of applying the existing methods and the proposed local (piecewise) polynomial estimator to the dendrite data of Figure 1. We show density estimates at vertices A, B, C and D. We used a fixed bandwidth of microns (the network has a total length of microns), and we will leave bandwidth selection and variable bandwidths to future projects. Unlike ESDK, ESCK and DE, the continuity of the LPLR estimates at a vertex is only imposed when there is evidence that the density is continuous there. Since the proposed method is particularly advantageous near vertices, we will mostly focus on that region.
Figure 6 shows density estimates near vertex A, using ESCK, ESDK and DE. We used a fixed bandwidth of microns, and an Epanechnikov kernel in ESCK and SEDK. Both ESCK and DE produced continuous estimates at the vertex. Although ESCK produced discontinuity at the vertex, the estimation clearly used data from all edges. Consequently the estimated density on the background edge near the vertex is overly inflated. In Figure 1, the data suggests that the density function has different behaviours when approaching vertex A from different directions. However none of the exisiting methods can characterise this. Next we apply the proposed piecewise local linear regression approach, and the result is in Figure 7. Since the data does not support continuity at vertex A, no further constraint estimation is required.
Next we look at vertex B in Figure 1. Unlike vertex A, spines around vertex B seem to become more sparse as approaching the vertex from all directions, especially from the two edges on the left. Apply ESCK, ESDK and DE, we get the estimates in Figure 8. ESCK and DE produced continuous estimates at vertex B. Under these two approaches, the estimated density is continuous at the vertex over any pair of edges, i.e. is continuous as travels from to , where and are two edges connected by the vertex. However continuity may only exist over a subset of edges, and ESCK, ESDK and DE are unable to provide such flexibility. By the proposed piecewise local linear estimation, we first estimate the density on each edge individually using only the data on that edge. This is panel (a) in Figure 9. By the testing procedure discussed in Section 4, there’s strong evidence in the data that the densities from the left two edges are equal when approaching the vertex. Following the constrained estimation in Section 5, we re-estimate the densities on the two edges by local piecewise estimator using data from all three edges subject to the constraint that the densities on the left two edges are equal at the vertex. The result is in panel (b). Suppose for are connected by vertex , let be the density over . Using the proposed piecewise local linear estimation, we can test for the equality of for any subset of ,as , and achieve the desired continuity as suggested by the data. On the other hand, under ESCK, ESDK and DE, the continuity at a vertex is decided by the user prior to model fitting, and the decision making is not data-driven.
Similarly, for vertex C, under piecewise local linear estimation, there’s evidence in the data that densities from the two edges in the foreground are equal when approaching the vertex. Figure 11 shows the estimated densities near vertex C, by ESCK, ESDK and DE, while piecewise local linear estimation is in Figure 11.
For vertex D, under piecewise local linear estimation, there’s evidence in the data that the densities from the two left edges are equal when approaching the vertex. Figure 13 shows the estimated densities near vertex D, by ESCK, ESDK and DE, while piecewise local linear estimation is in Figure 13. From this example we see that the network is embedded in . The vertex is on the right of the plot where labeled “vertex D”. There is an overpass on the left of the plot, where the two edges do not intersect.
8 Discussion
As we mentioned in the introduction, there is great potential demand in many fields for estimating the density of events on a network. An easy method for such estimations is to use the ordinary kernel density estimation method that assumes an unbounded plane, or kernel density estimation on the real line with the Euclidean distance replaced by network distance. Many papers in the literature employ this method. However, this method yields a bias in density estimation and so the method is likely to lead to misleading conclusions. We also discussed the equal-split discontinuous kernel estimator, the equal-split continuous kernel estimator, and diffusion estimator. None of those methods allows for discontinuity in the estimates. The first two methods lack theoretical justification and are computationally expensive. The diffusion estimator is mathematically equivalent to an infinite-sum generalization of the equal-split continuous rule applied to the Gaussian density, and it inherits the asymptotic properties of a kernel density estimator [7]. Also the diffusion estimator has a slower rate for the boundary bias.
In this paper, we have formulated a density estimation procedure on a linear network via local piecewise polynomial regression by way of binning. We first apply local polynomial regression on each edge individually, then we test joint equality of the regression functions at the vertex. If the null hypothesis is not rejected, locations within the -neighborhood of the vertex are re-estimated by local piecewise polynomial regression using data from all neighboring edges, subject to the equality constraint. The proposed piecewise linear procedure only imposes continuity at vertices when there is evidence in the data, and its asymptotic bias has the same rate at a vertex as an interior point. We studied the local linear case in detail, while there is a straightforward extension to higher-order polynomial approximation. When applying the proposed method to real data, if there are loops in the network, for simplicity, we only considered the shortest path between points, and we showed that only data points that have direct access to the evaluation point contribute to the estimation of the regression function.
Due to space limitation, we have only considered only fixed-bandwidth smoothing, and we have not considered data-based bandwidth selection nor adaptive smoothing on a network. We proposed a test of equal intercepts at a vertex to decide whether to assume equal intercepts when estimating the density in the neighborhood of the vertex. In the future, we will consider an estimator that, rather than using a pre-test, shrinks the unequal-intercepts estimator towards the equal-intercepts estimator. We have assumed that all locations are measured without error and lie exactly on the linear network. However this is not true for some applications, such as ambulance or taxi, where there are GPS error in their locations. Further study into such measurement error problems is required. In addition to estimating probability density or point process intensity, the proposed procedure is also used for regression problems, such as varying coefficient models on a linear network.
9 Proofs
Proof of Theorem 1.
By Taylor expansion,
[TABLE]
Note that if is a linear function then for so that the local linear estimator is exactly unbiased when is a linear function. To find the leading bias term for general function , note that
[TABLE]
where . Some straightforward matrix algebra then leads to the following expression for the leading bias term
[TABLE]
To derive the asymptotic variance of we have
[TABLE]
where , with . We stress that depends on and , but we drop them from its notation for simplicity. Now use approximation analogous to those used above we have
[TABLE]
where . These expressions can be combined to obtain
[TABLE]
where , and
[TABLE]
∎
Proof of Theorem 2.
Recall the asymptotic result of local polynomial regression:
[TABLE]
where and are the asymptotic bias and variance of . Let , we have asymptotically, and ,
[TABLE]
Now consider a contrast matrix such that (i.e. each row sum to zero). One choice of is
[TABLE]
which simply contrasts edge 1 with edge 2, edge 1 with edge 3. Then
[TABLE]
where
[TABLE]
Under the null hypothesis,
[TABLE]
We want to calculate the probability of generating a point at least as unlikely as the observed data point. To do that, we note that under the null hypothesis, the Mahalanobis distance follows a chi-square distribution
[TABLE]
The test statistic is invariant under the choice of contrast matrices. This is easy to see by noticing that the rows of are linearly independent. So we have a basis of some vector space (and it doesn’t matter if is all of , or some subspace thereof), and two different ordered bases for , and (necessarily of the same size, since two bases of the same vector space always have the same size, here they are the transpose of two contrast matrices and ):
[TABLE]
A change-of-basis matrix is a matrix that translates from coordinates to coordinates. That is, is a change-of-basis matrix (from to ) if, given the coordinate vector of a vector relative to , then gives the coordinate vector of relative to , for all in .
To get a change-of-basis matrix, we write each vector of in terms of , and these are the columns of , for , . We know we can do this because is a basis, so we can express any vector (in particular, the vectors in ) as linear combinations of the vectors in . Then the change-of-basis matrix translating from to is
[TABLE]
Matrix is always invertible. This is because just like there is a change-of-basis from to , there is also a change-of-basis from to . Since is a basis, we can express every vector in using the vectors in , for , . So the matrix , with
[TABLE]
has the property that given any vector , if is the coordinate vector of relative to , then is the coordinate vector of relative to . Then applying first and then translates coordinates into coordinates and back to coordinates, and thus must be the identity matrix (likewise for ). So and are both invertible, and every change-of-basis matrix is necessarily invertible. ∎
Proof of Theorem 3.
Standard calculation shows that the leading term of the bias is given by
[TABLE]
where is a column vector with entries being , for , and . Note that for data points , where is the edge that the evaluation point is located, the entries are simplified to , for . To approximate , we note that
[TABLE]
where . If follows that
[TABLE]
where . The matrix is symmetric, and its first row is and its diagonal is . All other entries are zero. We also have that matrix is symmetric, and its first row is . Its th row is and its diagonal is . All other entries of are zero. Finally . Take inverse we have
[TABLE]
To approximate , we note that
[TABLE]
Combining this result with the approximations before, we get
[TABLE]
where , , and are column vectors. The first entry of is , and the rest are , for . The first entry of is , and the rest, except the th entry, are , for , and the th entry is . The first entry of is , and the rest, except the th entry, are , for , and the th entry is . Finally, the first entry of is [math], and the rest, except the th entry, are , for , and the th entry is . Consequently,
[TABLE]
To derive the asymptotic variance of we have
[TABLE]
where , with . We stress that depends on and , but we drop them from its notation for simplicity. Now use approximation analogous to those used above we have
[TABLE]
It follows that
[TABLE]
where is symmetric, its first row is , and its diagonal entries are . All the other entries of are zero. is symmetric, its first row is , its th row is , and its diagonal is . All other entries of are zero. Finally is also symmetric, its first row is the zero vector, its th row is , and its diagonal is . All other entries of are zero.
Combining the above expressions, we have
[TABLE]
∎
Supplementary Materials
The supplementary materials include an R program containing code to perform the local linear regression method for density estimation on a network as described in this article. The program also contains all codes for simulating datasets used in the article.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Baddeley, A, Jammalamadaka, A and Nair, G. Multitype point process analysis of spines on the dendrite network of a neuron. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 63(5):673–694, 2014.
- 2[2] Baddeley, A, Rubak, E and Turner, R. Spatial point patterns: methodology and applications with R . Chapman and Hall/CRC, 2015.
- 3[3] Botev, Z, Grotowski, J and Kroese, D. Kernel density estimation via diffusion. The annals of Statistics , 38(5):2916–2957, 2010.
- 4[4] Fan, J. Local linear regression smoothers and their minimax efficiencies. The Annals of Statistics , pages 196–216, 1993.
- 5[5] Hall, P and Wand, M. On the accuracy of binned kernel density estimators. Journal of Multivariate Analysis , 56(2):165–184, 1996.
- 6[6] Jammalamadaka, A, Banerjee, S, Manjunath, B, and Kosik, K. Statistical analysis of dendritic spine distributions in rat hippocampal cultures. BMC bioinformatics , 14(1):287, 2013.
- 7[7] Mc Swiggan, G, Baddeley, A and Nair, G. Kernel density estimation on a linear network. Scandinavian Journal of Statistics , 44(2):324–345, 2017.
- 8[8] Okabe, A and Sugihara, K. Spatial analysis along networks: statistical and computational methods . John Wiley & Sons, 2012.
