Efficient Classification for Metric Data
Lee-Ad Gottlieb, Aryeh Kontorovich, Robert Krauthgamer

TL;DR
This paper introduces an efficient classification algorithm for data in general metric spaces that leverages the data's doubling dimension, providing improved accuracy, computational efficiency, and theoretical generalization bounds.
Contribution
It develops a new algorithm for metric space classification based on approximate Lipschitz extension and nearest neighbor search, with guarantees on efficiency and generalization.
Findings
Algorithm depends on data's doubling dimension for efficiency.
Experimental results show superiority over some kernel methods.
Provides sharper risk bounds for nearest neighbor classifiers.
Abstract
Recent advances in large-margin classification of data residing in general metric spaces (rather than Hilbert spaces) enable classification under various natural metrics, such as string edit and earthmover distance. A general framework developed for this purpose by von Luxburg and Bousquet [JMLR, 2004] left open the questions of computational efficiency and of providing direct bounds on generalization error. We design a new algorithm for classification in general metric spaces, whose runtime and accuracy depend on the doubling dimension of the data points, and can thus achieve superior classification performance in many common scenarios. The algorithmic core of our approach is an approximate (rather than exact) solution to the classical problems of Lipschitz extension and of Nearest Neighbor Search. The algorithm's generalization performance is guaranteed via the fat-shattering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
