Nonparametric Functional Approximation with Delaunay Triangulation
Yehong Liu, Guosheng Yin

TL;DR
The paper introduces a differentiable nonparametric method called Delaunay triangulation learner (DTL) that partitions feature space into simplices for functional approximation, combining geometric optimality with linear modeling.
Contribution
It presents a novel DTL algorithm that leverages Delaunay triangulation for nonparametric function approximation, with theoretical analysis and empirical comparison.
Findings
DTL effectively partitions feature space into simplices.
Theoretical properties of DTL are rigorously analyzed.
DTL outperforms some existing learners in numerical studies.
Abstract
We propose a differentiable nonparametric algorithm, the Delaunay triangulation learner (DTL), to solve the functional approximation problem on the basis of a -dimensional feature space. By conducting the Delaunay triangulation algorithm on the data points, the DTL partitions the feature space into a series of -dimensional simplices in a geometrically optimal way, and fits a linear model within each simplex. We study its theoretical properties by exploring the geometric properties of the Delaunay triangulation, and compare its performance with other statistical learners in numerical studies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Soil Geostatistics and Mapping · Machine Learning and Algorithms
Nonparametric Functional Approximation with Delaunay Triangulation
Yehong Liu1
Department of Statistics and Actuarial Science
The University of Hong Kong
Pokfulam Road, Hong Kong
Email: [email protected] and [email protected]
Guosheng Yin2
Department of Statistics and Actuarial Science
The University of Hong Kong
Pokfulam Road, Hong Kong
Email: [email protected] and [email protected]
Abstract
We propose a differentiable nonparametric algorithm, the Delaunay triangulation learner (DTL), to solve the functional approximation problem on the basis of a -dimensional feature space. By conducting the Delaunay triangulation algorithm on the data points, the DTL partitions the feature space into a series of -dimensional simplices in a geometrically optimal way, and fits a linear model within each simplex. We study its theoretical properties by exploring the geometric properties of the Delaunay triangulation, and compare its performance with other statistical learners in numerical studies.
KEY WORDS: Convex optimizations, Curvature regularization, Nonparametric regression, Piecewise linear, Prediction.
1 Introduction
In recent years, the great success of the deep neural network (DNN) has ignited the machine learning community in formulating predictive models as solutions to differentiable optimization problems. The distributions of samples, e.g., images and languages, are assumed to be concentrated in the regions of some low-dimensional smooth functionals (manifolds), such that even if the samples are not strictly on the manifold, the error can be very small. More specifically, the manifold assumption can be summarized in three folds:
- (1)
The underlying model (i.e., the principle functional) for the data distribution is on a low-dimensional functional.
- (2)
The principal functional is smooth but complicated in shape.
- (3)
The errors between the the functional and real data are very small relative to the variance of the functional.
These assumptions are the prerequisites of the success of DNN. First, the DNN is a universal approximator, which ensures its flexibility in shape when approximating the principal surface of the model. Second, the sparse techniques used in DNN, e.g., dropout and pooling, make it possible to regularize the network to concentrate on low-dimensional manifolds. Third, the sample size used in the training process of DNN is usually large, which ensures that DNN can well approximate the principle functional. Therefore, DNN is a successful example of a nonparametric, flexible and differentiable approximator, yet with its only weakness on interpretability. We focus on the problem of approximating a low-dimensional smooth functional with small errors or without errors, and propose an interpretable, geometrically optimal, nonparametric and differentiable approach, the Delaunay triangulation learner (DTL).
The DTL has a number of attractive properties:
- (1)
DTL fits a piecewise linear model, which is well established and can be easily interpreted.
- (2)
DTL naturally separates the feature space in a geometrically optimal way, and it has local adaptivity in each separated region when fitting a smooth functional.
- (3)
Compared with the general triangulation methods (e.g., random triangulation), where the geometric structure of the triangles can be difficult to analyze, the Delaunay triangulation makes it possible for many stunning geometrical and statistical properties, which lays out a new direction for statistical research on piecewise linear models.
- (4)
Based on the construction of the DTL, we can define the regularization function geometrically to penalize the roughness of the fitted function.
- (5)
DTL can accommodate multidimensional subspace interactions in a flexible way, which is useful when the output of the model is dependent on the impact of a group of covariates.
- (6)
DTL is formulated as a differentiable optimization problem, which can be solved via a series of well-known gradient-based optimizers.
To better illustrate the advantages of the DTL both theoretically and empirically, we present the low-dimensional settings where there are only a small number of features (), while leaving extensions to high-dimensional settings in our future work.
2 Delaunay Triangulation
For ease of exposition, we use ‘triangle’ and ‘simplex’ exchangeably throughout this paper, as triangle is a two-dimensional simplex. All definitions and theories are established on general dimension (). In geometry, triangulation is often used to determine the location of a point by forming simplex to the point from other known points. Let be a set of points in the -dimensional space (), and the convex hull of the points has a nonzero volume. The Delaunay triangulation of the points is defined as follows.
Definition 2.1**.**
Given a set of points in the -Euclidean space, the Delaunay triangulation is a triangulation such that no point in is inside the circumscribed spheres of other triangles in .
This definition is illustrated in Figure 1 (a). Geometrically, it has been well established that for any given points in general position (the convex hull of is non-degenerate in -dimensional space), there exists only one Delaunay triangulation , i.e., the Delaunay triangulation is unique (Delaunay, 1934). Intuitively, the Delaunay triangulation algorithm results in a group of simplices that are most regularized in shape, in comparison to any other type of triangulation. As shown in Figure 1 (b) and (c), in a two-dimensional space, the mesh constructed by the Delaunay triangulation has fewer skinny triangles compared with that by a random triangulation. Figure 2 shows a saddle surface and its Delaunay triangulation.
Several algorithms can be used to implement the Delaunay triangulation, including the flip algorithm, the Bowyer–Watson algorithm (Bowyer, 1981), and (Watson, 1981), and the divide-and-conquer paradigm (Cignoni et al., 1998). The flip algorithm is fairly straightforward as it constructs the triangulation of points and flips the edges until every triangle is of the Delaunay type. It takes time for the edge-flipping operation, where is the number of points. The Bowyer–Watson algorithm is relatively more efficient as it takes time, by repeatedly adding one vertex at a time and re-triangulating the affected parts of the graph. In the divide-and-conquer algorithm, a line is drawn recursively to split the vertices into two groups, and the Delaunay triangulation is then implemented for each group. The computational time can be reduced to owing to some tuning techniques. For higher dimensional cases (), Fortune et al. (1992) proved that the plane-sweep approach produces the worst case time complexity of (Klee2015). In this paper, the Delaunay triangulation algorithm is implemented by using the Python SciPy package (Jones et al., 2001).
3 Delaunay Triangulation Learner
3.1 Formalization of DTL
As shown in Figure 2, given the data from a probabilistic model (joint distribution) , where and , the DTL is formalized as follows.
- (1)
Delaunay Partition
We partition the feature space with a triangle mesh by conducting Delaunay triangulation on and obtain a set of simplices, denoted as . The convex hull of the Delaunay triangulation, denoted as , is unique by definition. For any point in the feature space, if , there exists a simplex in , denoted as , that contains ,
[TABLE]
where is the indicator function. If , there exists a point nearest to , with the corresponding response
- (2)
Parabolic Lifting
Let denote a vector of location parameters. We construct the DTL as a linear interpolation function based on , which is denoted as . The linear interpolation function takes the form
[TABLE]
where is a -dimensional linear function defined on the simplex , satisfying
[TABLE]
In summary, the DTL partitions the feature space with a Delaunay triangulation algorithm. It is a nonparametric functional learner as the dimension of the parameter grows at the same pace as the sample size .
3.2 Optimization
In the DTL functional optimization problem, one has a response variable and a vector of input variables in a -dimensional feature space. Given the data , our goal is to obtain an estimator by solving an optimization problem,
[TABLE]
where is the loss function, is a tuning parameter and is a regularization function measuring the roughness of . The loss function can be squared loss or absolute loss for regression problems, and exponential loss for classification problems.
As for the regularization function , we propose using the total discrete curvature as a measure of roughness. Suppose that a vertex is inside the convex hull , and let denote the set of simplices that contain . As a linear interpolation function, the DTL is strictly linear in each triangulated region. Through the function , each simplex has a corresponding -dimensional simplex as its functional image. More specifically, is a -dimensional simplex for any . Furthermore, we define a -dimensional vector as the standard up-norm vector of , which satisfies
[TABLE]
where , is the vector composed of the first -dimensional elements of , and is a -dimensional gradient vector of , where
[TABLE]
Here, are the vertices of the same simplex and are the corresponding location parameters of the DTL. The first equation in (3.1) ensures that the vector is orthogonal to the simplex . The second implies that the vector is normalized, and the third requires the vector to be in the upward direction.
As the up-norm vector shares the same projection as the gradient vector on the feature space, it takes the form of , where is an unknown constant. By solving the equations in (3.1), we obtain , and thus the standard up-norm vector can be written as
[TABLE]
Figure 3 exhibits the geometric view of , and Figure 4 shows a plot of the standard up-norm vectors on a discrete surface.
Let denote the number of elements in , and define the total discrete curvature as
[TABLE]
where represents the degree of the angle between and , and is the number of all possible combinations for . The total discrete curvature is calculated by measuring the angle between each pair of the up-norm vectors and averaging the total degree of the angles grouped by each vertex. A special case is that all the simplices of the DTL collectively form a hyperplane in the -dimension, where all the standard up-norm vectors are identical and the total discrete curvature equals zero. As the cosine of an angle formed by two unit vectors is equal to the inner product of the vectors, it is equivalent to defining the regularization function as
[TABLE]
Both the loss function and the regularization function are differentiable with respect to , thus all the gradient-based optimizers can be applied to solve the optimization problem of the DTL. The gradient of the objective function is given by Algorithm 1 displays the Adaptive Moment Estimation (Adam) algorithm (Kingma and Ba, 2015) that iteratively updates the vector , where , and .
3.3 DTL Advantages
Figure 5 shows a comparison between the DTL, decision tree, and multivariate adaptive regression splines (MARS), fitted on the same data generated from a saddle surface model, , where the noise follows the normal distribution . In comparison with the decision tree and MARS, the DTL appears to be smoother and more flexible in shape. The tree-based method tends to be rough when estimating smooth functionals, as it is not easy to smooth out by fitting more complicated models in the terminal nodes. The MARS show better adaptivity in handling smooth nonlinear functions than decision trees. The DTL provides an intrinsic way to partition the feature space with the Delaunay triangulation algorithm, and it has the advantage of balancing the roughness and smoothness depending on the local availability of data and that can lead to better predictive accuracy. Figure 6 shows the behavior of the regularized DTL with different values of when fitting a two-dimensional linear model under a squared loss function.
4 Theoretical Properties
We first study the geometrical optimality and asymptotic properties of the Delaunay triangulation mesh under general random distributions, and then these properties are used as cornerstones to find the statistical properties of the DTL.
4.1 Geometrical Optimality Properties
We define a geometric loss function to analyze the geometric optimality of the Delaunay triangulation with random points.
Definition 4.1**.**
Let be points in in general position with a convex hull , and be any triangulation of . For any point , if , the geometric loss function of the points is defined as
[TABLE]
where are the vertices of the triangle in that covers . Otherwise, if , we define
Theorem 1**.**
Let be i.i.d samples from a continuous density , which is bounded away from zero and infinity on . Then,
[TABLE]
where is the Delaunay triangulation of .
Proof.
For any point and a simplex with vertices if , define the generalized geometric loss function as
[TABLE]
Otherwise, if By Theorem 1 in Rajan (1991), among all the triangles with vertices in that contain the point , the Delaunay triangle minimizes the geometric loss function of the triangle at the point. Thus, for any triangulation , we have
[TABLE]
Thus,
[TABLE]
∎
4.2 Geometrical Asymptotic Properties
Since the DTL makes prediction for a point by fitting a linear interpolation function locally on the Delaunay triangle that covers the point, the asymptotic behavior of the covering triangle is crucial to the asymptotics of DTL in both regression and classification problems. We show the convergence rate of the size of the covering triangle in terms of its average edge length.
Lemma 1**.**
Let take values on . Let denote the nearest neighbour of among . Then, for ,
[TABLE]
where and is the volume of a unit -ball; and for , it reduces to
[TABLE]
Proof.
Let denote the nearest neighbor of , and let Define the -ball Note that the -balls are disjoint, and , and we have that . Thus,
[TABLE]
where is the Lebesgue measure. Hence,
[TABLE]
For , it can be shown that
[TABLE]
which proves the case of . For , it is obvious that
[TABLE]
∎
Lemma 2**.**
Let be i.i.d samples from a continuous density , which is bounded away from zero and infinity on . Let denote the event that falls inside the convex hull of . Then, , as .
Proof.
Let , denote a series of -square-neighborhoods of the vertices of the , and Define the event . Then, given any arbitrary small value of , we have
[TABLE]
which implies , as Furthermore,
[TABLE]
which is guaranteed by the convexity of the convex hull. Thus, as ∎
Theorem 2**.**
Let be i.i.d samples from a continuous density , which is bounded away from zero and infinity on . If the point is inside the convex hull of , define as the average length of the edges of the Delaunay triangle that covers ; otherwise, , where is the nearest neighbor point of among . Then,
[TABLE]
where is a positive constant depending only on the dimension and density .
The proof of Theorem 2 contains two parts. Following the machinary of the 1-NN proof (Biau et al., 2015), the first part shows that the convergence rate of for the out-of-hull case is , and since the point falls inside the convex hull of with probability one, the convergence rate is determined by the inside-hull case. The second part of the proof follows the results of Jimenez and Yukich (2002).
Proof of Theorem 2.
Let denote the event that point falls inside the convex hull of , and by the definition of we have
[TABLE]
where is the average length of the edges of the Delaunay triangle that covers point , and if is outside the convex hull, define
We first show By adopting the techniques in Biau et al. (2015), we have
[TABLE]
where is a -ball with center and radius , and is the probability measure. By the Lebesgue differentiation theorem, at Lebesgue-almost all ,
[TABLE]
where is the volume of a standard -ball. At such , we have for fixed ,
[TABLE]
Furthermore, Fatou’s lemma implies
[TABLE]
To establish the upper bound for , we take an arbitrarily large constant and split
[TABLE]
By Fatou’s lemma, the first term can be bounded as follows,
[TABLE]
Then, we show that the second term is controlled by the choice of . By the symmetry of the nearest neighbor distance, we can rewrite as
[TABLE]
Denote , and it has been proved in Lemma 1 that
[TABLE]
Set , and we know that . By Jensen’s inequality, when ,
[TABLE]
Thus,
[TABLE]
As a result, we have shown for ,
[TABLE]
For , (4.1) leads to
[TABLE]
As we have By the Cauchy–Schwarz inequality,
[TABLE]
As shown in Lemma 2, , as . Thus,
Next, we show that Let denote the number of triangles in the Delaunay triangulation of and let denote the triangles. By definition, we show that where is the set of edges in the Delaunay graph that are connected with point , and is the length of edge . As shown in Lemma 2,
[TABLE]
as By the symmetry of triangle indices, we have
[TABLE]
Thus, we have Hence, by reindexing the points of
[TABLE]
where denotes the set of edges that are connected with point
By Theorem 2.1 in Jimenez and Yukich (2002), it is shown that for the Delaunay graph,
[TABLE]
where ’s are the distances between the points , to the original point and , are from a -dimensional unit intensity Poisson process with density
With and since in probability, we obtain
[TABLE]
where is a constant that depends only on dimension and density . ∎
Corollary 1**.**
Let be i.i.d samples from a continuous density , which is bounded away from zero and infinity on . If the point is inside the convex hull of , define as any one of the vertices of the covering Delaunay triangle; otherwise, define as the nearest neighbor point of Then, in probability.
Proof.
We need to show that If point is outside the convex hull of , , by definition. Otherwise, From Theorem 2, we conclude that , as which implies that in probability. ∎
This result shows that the vertices of the covering simplex of point all converge to in probability.
4.3 General Case
For regression problems, the DTL is shown to be consistent, and for classification problems, the DTL has an error rate smaller than or equal to , where is the Bayes error rate (the minimum error rate that can be achieved by any function approximator).
Regression
Theorem 3**.**
Assume that , and is continuous. Then, the DTL regression function estimate satisfies
[TABLE]
where is the minimal value of the risk over all continuous functions .
Proof.
If point is outside the convex hull , , where is the corresponding response of the nearest neighbor point If is inside , denote the simplex covering as and the vertices of the simplex as or simply Thus, for any interior point of simplex it can be represented as where , for , and As the DTL is linear inside the simplex, and where is the response to corresponding point In general, with .
Thus, we have
[TABLE]
From Corollary 1, we know that for each , in probability. By the continuity of function , the second term in (4.3) goes to zero, i.e., \mathbb{E}\big{|}I_{n}\sum_{j=1}^{p+1}\lambda_{[j]}(\mathbb{X})\psi(\mathbb{X}_{[j]})-\psi(\mathbb{X})\big{|}^{2}\rightarrow 0. For the third term in (4.3), the Cauchy–Schwarz inequality implies
[TABLE]
As , , then
[TABLE]
and \mathbb{E}\big{|}I_{n}\sum_{j=1}^{p+1}\lambda_{[j]}(\mathbb{X})\psi(\mathbb{X}_{[j]})-\psi(\mathbb{X})\big{|}^{2}=o(1).
For the first term in (4.3), we have
[TABLE]
and again by the continuity of ,
[TABLE]
Since is the minimum value of the consistency is shown. ∎
Classification
Theorem 4**.**
For a two-class classification model, if the conditional probability is a continuous function of . The mis-classification risk of a DTL classifier is bounded as
[TABLE]
where is the Bayes error of the model.
Proof.
By Lemma 2, the point is covered by a Delaunay triangle with probability one. Denote the vertices of the covering triangle as Define as the empirical risk of the DTL classifier,
[TABLE]
where is the prediction of the DTL classifier, and Since Corollary 1 shows that for any , in probability, based on the continuity of , we have in probability. Thus, , in probability. The conditional Bayes risk By the symmetry of function in , we write
[TABLE]
By definition, the DTL risk and since is bounded by 1, applying the dominated convergence theorem, The limit yields
[TABLE]
As the Bayes risk is the expectation of , we have and thus, ∎
4.4 Smooth and Noiseless Case
As we consider the local behavior at , without loss of generality, we assume .
Lemma 3**.**
Assume that is a Lebesgue point of , . Denote as the vertices of the Delaunay triangle that covers the point , and further let denote these vertices ordered by their distances from . Then,
[TABLE]
where , and are independent standard exponential random variables, and are independent random vectors uniformly distributed on the surface of , a -ball with center and radius .
Proof.
Let be a positive constant to be chosen later. Consider a density related to as follows: let
[TABLE]
and set
[TABLE]
which is a proper density (i.e., nonnegative and integrating to one) for large enough. Note that
[TABLE]
Therefore, by Doeblin’s coupling method, there exist random variables and with corresponding densities and , such that ∎
Repeating this times, we create two coupled samples of random variables that are within the sample. The sample is drawn from the distribution of , and the sample is drawn from the distribution of . Recall that the total variation distance between two random vectors , is defined by where denotes the Borel sets of Let and be the reordered samples. Then
[TABLE]
Define
[TABLE]
where is the probability measure of . We recall that are uniform order statistics, and thus In fact, this convergence is also in the sense. Also, if , then Thus,
[TABLE]
for all large enough, depending on Let be random vectors uniformly distributed on the surface of Then,
[TABLE]
as on has a radially symmetric distribution. Therefore,
[TABLE]
which concludes the proof.
Theorem 5**.**
Assume that is a Lebesgue point of the density and , and the regression function is continuously differentiable in a neighborhood of . Then,
- (1)
**
- (2)
**
Proof.
By Taylor’s series approximation,
[TABLE]
(where as . Observe that in probability, if in probability. But, by Lemma 3 , and therefore,
[TABLE]
Still, by Lemma 3,
[TABLE]
Define we have
[TABLE]
Thus, since
[TABLE]
which implies that
[TABLE]
∎
Theorem 6**.**
Let be i.i.d samples from a continuous density , which is bounded away from zero and infinity on . Then, for a function which has an upper-bounded second-order derivative, we have
[TABLE]
where is a DTL estimate of based on samples , is the upper bound of the second-order derivative, is a positive constant depending only on the dimension and density , and is a norm of the continuous functional space.
Proof.
Define as the Delaunay triangulation on , and as the Delaunay simplices, where is the number of simplices in . Define as the average length of the edges of the Delaunay triangle . By the -error bound obtained in the inequality (3.22) in Waldron (1998), for each simplex , we have
[TABLE]
where is the supremum of the second-order derivative of on simplex and is the maximum edge of . Then, it can be shown that
[TABLE]
where . Thus,
[TABLE]
and by Theorem 2, we have
[TABLE]
∎
Theorem 7** (Local Adaptivity).**
Let be i.i.d samples from a continuous density , which is bounded away from zero and infinity on . Assume that the regression function is continuously differentiable in . Then, for any point , , as where is the gradient of the DTL at point
Proof.
For any fixed point , by Taylor series approximation,
[TABLE]
(where as . Observe that in probability, if in probability. Based on Theorem 2 and Corollary 1, we have , ∎
Based on the multivariate mean-value theorem, there exists a point such that As ,
4.5 Regularized Estimates
Under regularizations, the estimated parameter vector is given by
[TABLE]
To see how accurate the regularized estimate is, we compare with the population optimal solution
Definition 4.2**.**
Given the location parameter vector , the regularization function is said to be decomposable with respect to , if a pair of subspaces of , for all and , where
Definition 4.3**.**
The loss function satisfies a restricted strong convexity condition (RSCC) with curvature , if the optimal solution , for all , where .
Theorem 8**.**
Based on a strictly positive regularization constant , if the loss function satisfies RSCC, then we have and where
Proof.
We first show that the regularization function of DTL is decomposable, and then we use Theorem 1 and Corollary 1 in Negahban (2012) to prove the theorem. For a given set of parameters , define as its convex hull, and as the set of points that are connected to any point of by an edge of the Delaunay triangulation . Then, it is clear that based on the definition of geometric regularization function, we have for all and . Thus if the loss function satisfies RSCC for such an , then we can make the conclusion.
∎
5 Comparisons of Statistical Learners
We use two artificial classification datasets to visualize how the DTL-based methods differ from the neural network and tree-based methods. In particular, we demonstrate the smoothness and robustness of the DTL-based methods, when handling feature interaction problems.
The artificial datasets are generated from three models of the Python Scikit learn package. The ‘moons’ model has two clusters of points with moon shapes, and the ‘circles’ model consists of two clusters of points distributed along two circles with different radiuses. As shown in Figures 7–8, the data are displayed with blue points for and red points for . Three classifiers are considered including the DTL, neural network, decision tree, and the estimated probability function of each classifier is plotted with color maps.
We first illustrate the local adaptivity of the DTL. As shown in panel (a) of Figures 7–8, the estimated probability functions of the DTL are piecewise linear in the convex hull of the observations. This feature provides the DTL the advantage when approximating smooth boundary, since it can locally choose to use a piecewise linear model to estimate the probability function. Figure 7 (b) exhibits a less adaptive classification probability (either 0 or 1) using neural network, and Figure 7 (c) shows a clear stairwise shape of the classification region using the decision tree.
We next point out that the DTL can more reliably capture the feature interactions. As exhibited in Figure 7 (c) and Figure 8 (c), there are some spiky regions in the estimated probability functions using the decision tree. It shows the weakness of the decision tree in capturing the feature interactions via the marginal tree splitting approach.
6 Discussion
We have presented the DTL as a differentiable and nonparametric algorithm that has simple geometric interpretations. The geometrical and statistical properties of the DTL are investigated to explain its advantages under various settings, including general regression, classification models, smooth and noiseless models. These properties signify the importance of developing more applications based on this piecewise linear learner, e.g., the DTL can be used as an alternative approach to substituting the nonlinear activation function in the neural network. The DTL can also be generalized into manifold learning approaches, with multi-dimensional input and output. There exist a series of open questions warranting further investigations. In terms of theory, we have focused on low-dimensional settings () for simplicity, while it would be more interesting to generalize the DTL into high-dimensional settings. As for computation, investigation on GPU-based parallel computing for DTL is also warranted.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Archer (2008) Archer, K., Kimes, R., Empirical characterization of random forest variable importance measures. Computational Statistics & \& Data Analysis 52 , 2249–2260.
- 2Betancourt and Skolnick (2001) Betancourt, M. R., Skolnick, J. (2001). Universal similarity measure for comparing protein structures. Biopolymers 59 , 305–309.
- 3Biau, G. et al. (2008) Biau, G., Devroye, L. and Lugosi, G. (2008). Consistency of random forests and other averaging classifiers. Journal of Machine Learning Research 9 , 2015–2033.
- 4Biau et al. (2015) Biau, G., Devroye, L. and Lugosi, G. (2015). Lectures on the nearest neighbor method. Springer, 2015.
- 5Blaser and Fryzlewicz (2016) Blaser, R., Fryzlewicz, P. (2016). Random rotation ensembles. Journal of Machine Learning Research 17 , 1–26.
- 6Bowyer (1981) Bowyer, A. (1981). Computing Dirichlet tessellations. The Computer Journal 24 , 162–166.
- 7Bramble and Zlamal (1970) Bramble, J.H., Zlamal, M. (1970). Triangular elements in the finite element methods. Mathematics of Computation 24 , 809–820.
- 8Breiman et al. (1983) Breiman, L., Friedman, J., Olshen, R., Stone, C. (1983). Classification and Regression Trees . Wadsworth, Belmont, CA.
