The VC Dimension of Metric Balls under Fr\'echet and Hausdorff Distances
Anne Driemel, Andr\'e Nusser, Jeff M. Phillips, Ioannis Psarros

TL;DR
This paper investigates the VC dimension of metric balls under Fréchet and Hausdorff distances for sets of polygonal curves, providing bounds that facilitate efficient sampling-based algorithms in computational geometry.
Contribution
It derives upper and lower bounds on the VC dimension for set systems of polygonal curves with respect to these metrics, advancing understanding of their complexity.
Findings
Upper bounds are near-quadratic or near-linear in curve complexity.
Bounds are logarithmic in the complexity of the ground set curves.
Results enable improved sampling bounds for large sets of simple curves.
Abstract
The Vapnik-Chervonenkis dimension provides a notion of complexity for systems of sets. If the VC dimension is small, then knowing this can drastically simplify fundamental computational tasks such as classification, range counting, and density estimation through the use of sampling bounds. We analyze set systems where the ground set is a set of polygonal curves in and the sets are metric balls defined by curve similarity metrics, such as the Fr\'echet distance and the Hausdorff distance, as well as their discrete counterparts. We derive upper and lower bounds on the VC dimension that imply useful sampling bounds in the setting that the number of curves is large, but the complexity of the individual curves is small. Our upper bounds are either near-quadratic or near-linear in the complexity of the curves that define the ranges and they are logarithmic in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
University of Bonn, [email protected] Driemel thanks the Hausdorff Center for Mathematics for their generous support and the Netherlands Organization for Scientific Research (NWO) for support under Veni Grant 10019853. Max Planck Institute for Informatics &
Saarbrücken Graduate School of Computer Science, [email protected] University of Utah, [email protected] Phillips thanks his support from NSF CCF-1350888, ACI-1443046, CNS-1514520, CNS-1564287, and IIS-1816149. Part of the work was completed while visiting the Simons Institute for Theory of Computing. National & Kapodistrian University of Athens, [email protected] research is co-financed by Greece and the European Union (European Social Fund- ESF) through the Operational Programme Human Resources Development, Education and Lifelong Learning in the context of the project “Strengthening Human Resources Research Potential via Doctorate Research” (MIS-5000432), implemented by the State Scholarships Foundation (IKY). \CopyrightAnne Driemel, Jeff M. Phillips, Ioannis Psarros\ccsdesc[500]Theory of computation Randomness, geometry and discrete structures \ccsdesc[500]Theory of computation Computational geometry \hideLIPIcs\EventEditorsGill Barequet and Yusu Wang \EventNoEds2 \EventLongTitle35th International Symposium on Computational Geometry (SoCG 2019) \EventShortTitleSoCG 2019 \EventAcronymSoCG \EventYear2019 \EventDateJune 18–21, 2019 \EventLocationPortland, United States \EventLogosocg-logo \SeriesVolume129 \ArticleNo0
The VC Dimension of Metric Balls under Fréchet and Hausdorff Distances
Anne Driemel
André Nusser
Jeff M. Phillips
Ioannis Psarros
Abstract
The Vapnik-Chervonenkis dimension provides a notion of complexity for systems of sets. If the VC dimension is small, then knowing this can drastically simplify fundamental computational tasks such as classification, range counting, and density estimation through the use of sampling bounds. We analyze set systems where the ground set is a set of polygonal curves in and the sets are metric balls defined by curve similarity metrics, such as the Fréchet distance and the Hausdorff distance, as well as their discrete counterparts. We derive upper and lower bounds on the VC dimension that imply useful sampling bounds in the setting that the number of curves is large, but the complexity of the individual curves is small. Our upper bounds are either near-quadratic or near-linear in the complexity of the curves that define the ranges and they are logarithmic in the complexity of the curves that define the ground set.
keywords:
VC dimension, Fréchet distance, Hausdorff distance
1 Introduction
A range space (also called set system) is defined by a ground set and a set of ranges , where each is a subset of . A data structure for range searching answers queries for the subset of the input data that lies inside the query range. In range counting, we are interested only in the size of this subset. In our setting, a range is a metric ball defined by a curve and a radius. The ball contains all curves that lie within this radius from the center under a specific distance function (e.g., Fréchet or Hausdorff distance).
A crucial descriptor of any range space is its VC-dimension [42, 40, 38] and related shattering dimension, which we define formally below. These notions quantify how complex a range space is, and have played fundamental roles in machine learning [43, 7], data structures [15], and geometry [28, 12]. For instance, specific bounds on these complexity parameters are critical for tasks as diverse as neural networks [7, 33], art-gallery problems [41, 24, 34], and kernel density estimation [32].
The last five years have seen a surge of interest into data structures for trajectory processing under the Fréchet distance, manifested in a series of publications [18, 27, 19, 2, 44, 9, 22, 13, 21, 8, 23]. Partially motivated by the increasing availability and quality of trajectory data from mobile phones, GPS sensors, RFID technology and video analysis [35, 45, 26]. Initial results in this line of research, such as the approximate range counting data structure by de Berg, Gudmundsson and Cook [18], use classical data structuring techniques. Afshani and Driemel extended their results and in addition showed lower bounds on the space-query-time trade-off in this setting [2]. In particular, they showed a lower bound which is exponential in the complexity of the curves for exact range searching. In 2017, ACM SIGSPATIAL, the premier conference for geographic information science, devoted their software challenge (GIS CUP) to the problem of range searching under the Fréchet distance [44]. Spurring further developments, the most recent results explore the use of heuristics [11] and randomization [14].
The Fréchet distance is a popular distance measure for curves. Intuitively, it can be defined using the metaphor of a person walking a dog, where the person follows one curve and the dog follows the other curve, and throughout their traversal they are connected by a leash of fixed length. The Fréchet distance corresponds to the length of the shortest dog leash that permits a traversal in this fashion. The Fréchet distance is very similar to the Hausdorff distance for sets, which is defined as the minimal maximum distance of a pair of points, one from each set, under all possible mappings between the two sets. The difference between the two distance measures is that the Fréchet distance requires the mapping to adhere to the ordering of the points along the curve. Both distance measures allow flexible associations between parts of the input elements which sets them apart from classical distances and makes them so suitable for trajectory data under varying speeds.
Our contribution in this paper is a comprehensive analysis of the Vapnik-Chervonenkis dimension of the corresponding range spaces. The resulting VC dimension bounds, while being interesting in their own right, have a plethora of applications through the implied sampling bounds. We detail a range of implications of our bounds in Section 10.
2 Definitions
In this section, we formally define the distances between curves as well as VC-dimension and range spaces, so we can state our main results. This basic set up will be enough to prove our results for the discrete variants of the distance measures we consider. The basic proofs in the discrete setting also serve as a template for the proofs in the main part of the paper. Starting in Section 6 we provide more advanced geometric definitions and properties about VC dimension which we then use in our proofs on the continuous variants of the distance measures we consider.
2.1 Distance measures
In the following, we define the Hausdorff distance, the discrete and the continuous Fréchet distance, and the Weak Fréchet distance. We denote by the Euclidean norm .
Definition 2.1** (Directed Hausdorff distance.).**
Let , be two subsets of some metric space . The directed Hausdorff distance from to is:
[TABLE]
Definition 2.2** (Hausdorff distance.).**
Let , be two subsets of some metric space . The Hausdorff distance between and is:
[TABLE]
Definition 2.3**.**
Given polygonal curves and with vertices and respectively, a traversal is a sequence of pairs of indices referring to a pairing of vertices from the two curves such that:
, , . 2. 2.
* and .* 3. 3.
* .*
Definition 2.4** (Discrete Fréchet distance).**
Given polygonal curves and with vertices and respectively, we define the Discrete Fréchet Distance between and as the following function:
[TABLE]
where denotes the set of all possible traversals for and .
Any polygonal curve with vertices and edges has a uniform parametrization that allows us to view it as a parametrized curve .
Definition 2.5** (Fréchet distance).**
Given two parametrized curves , their Fréchet distance is defined as follows:
[TABLE]
where and range over all continuous, non-decreasing functions with , and .
Definition 2.6** (Weak Fréchet distance).**
Given two parametrized curves , their Weak Fréchet distance is defined as follows:
[TABLE]
where and range over all continuous functions with , and .
2.2 Range spaces
Each range space can be defined as a pair of sets , where is the ground set and is the range set. Let be a range space. For , we denote:
[TABLE]
If contains all subsets of , then is shattered by .
Definition 2.7** (Vapnik-Chernovenkis dimension).**
The Vapnik-Chernovenkis dimension [38, 40, 42] (VC dimension) of is the maximum cardinality of a shattered subset of .
Definition 2.8** (Shattering dimension).**
The shattering dimension of is the smallest such that, for all m,
[TABLE]
It is well-known that for a range space with VC-dimension and shattering dimension that and . So bounding the shattering dimension and bounding the VC-dimension are asymptotically equivalent within a log factor. For a proof of this and other basic facts on range spaces we refer the reader to the textbook of Har-Peled [28].
Definition 2.9** (Dual range space).**
Given a range space , for any , we define
[TABLE]
The dual range space of is the range space .
It is a well-known fact that if a range space has VC dimension , then the dual range space has VC dimension (see e.g. [28]).
There are many techniques for bounding the VC dimension of geometric range spaces. For instance when the ground set is and the ranges are defined by inclusion in halfspaces, then the range space and its dual range space are isomorphic and both have VC-dimension and shattering dimension . When the ranges are defined by inclusion in balls, then the VC-dimension and shattering dimension is , and the dual range spaces have bounds of [28]. It is also for instance known [10] that the composition ranges formed as the -fold union or intersection of ranges from a range space with bounded VC-dimension induces a range space with VC-dimension , and it was recently shown by Csikós et al. that this is tight for even some simple range spaces such as those defined by halfspaces [16, 17]. More such results are deferred to Section 6.
2.3 Range spaces induced by distance measures
Let be a pseudometric space. We define the ball of radius and center , under the distance measure , as the following set:
[TABLE]
where . The doubling dimension of a metric space , denoted as , is the smallest integer such that any ball can be covered by at most balls of half the radius.
In this paper, we study the VC dimension of variants of range spaces induced by pseudometric spaces111While we may use the term metric or pseudometric to define the range, our methods do not assume any metric properties of the inducing distance measure. by setting and
[TABLE]
It is a reasonable question to ask whether the doubling dimension of a metric space influences the VC dimension of the induced range space. In general, a bounded doubling dimension does not imply a bounded VC dimension of the induced range space and vice versa. Recently, Huang et al. [31] showed that if we allow a small -distortion of the distance function , the shattering dimension can be upper bounded by . It is conceivable that the doubling dimension of the metric space of the Discrete Fréchet distance and Hausdorff distance is bounded, as long as the underlying metric has bounded doubling dimension. However, for the continuous Fréchet distance, the doubling dimension is known to be unbounded [20]. Moreover, we will see that much better bounds can be obtained by a careful study of the specific distance measure.
Specifically, we study an unbalanced version of the above range space, in the sense that we distinguish between the complexity of objects of the ground set and the complexity of objects defining the ranges. To this end, we define, for any integers and , and we treat the elements of this set as ordered sets of points in of size . Formally, we study range spaces with ground set and a range set of the form
[TABLE]
under different variants of the Fréchet and the Hausdorff distance. We emphasize that the range space consists of ranges of all radii.
3 Our Results
Table 3 shows an overview of our bounds. For metric balls defined on point sets (resp. point sequences) in we show that the VC dimension is at most near-linear in , the complexity of the ball centers that define the ranges, and at most logarithmic in , the complexity of point sets of the ground set. Our lower bounds show that these bounds are almost tight in all parameters , , and . For the Hausdorff distance, where the ground set consists of continuous polygonal curves in , we show an upper bound that is quadratic in , quadratic in and logarithmic in . The same bound holds for the Fréchet distance, where the ground set consists of sets of line segments in . We obtain slightly better bounds in for the Weak Fréchet distance. Our lower bounds extend to the continuous case, but are only tight in the dependence on – the complexity of the ground set.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Peyman Afshani and Anne Driemel. On the complexity of range searching among curves. Co RR , ar Xiv:1707.04789 v 1, 2017.
- 2[2] Peyman Afshani and Anne Driemel. On the complexity of range searching among curves. In Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018 , pages 898–917, 2018.
- 3[3] S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. Blink DB: queries with bounded errors and bounded response times on very large data. In Euro Sys , 1993.
- 4[4] Yohji Akama, Kei Irie, Akitoshi Kawamura, and Yasutaka Uwano. VC dimension of principal component analysis. Discrete & Computational Geometry , 44:589–598, 2010.
- 5[5] Helmut Alt, Bernd Behrends, and Johannes Blömer. Approximate matching of polygonal shapes. Annals of Mathematics and Artificial Intelligence , 13(3):251–265, Sep 1995.
- 6[6] Helmut Alt and Michael Godau. Computing the Fréchet distance between two polygonal curves. International Journal of Computational Geometry & Applications , 05:75–91, 1995.
- 7[7] Martin Anthony and Peter L. Bartlett. Neural Network Learning: Theoretical Foundations . Cambridge University Press, 1999.
- 8[8] Maria Astefanoaei, Paul Cesaretti, Panagiota Katsikouli, Mayank Goswami, and Rik Sarkar. Multi-resolution sketches and locality sensitive hashing for fast trajectory processing. In International Conference on Advances in Geographic Information Systems (SIGSPATIAL 2018) , volume 10, 2018.
