Efficient Classification for Metric Data

Lee-Ad Gottlieb; Aryeh Kontorovich; Robert Krauthgamer

arXiv:1306.2547·cs.LG·July 14, 2014

Efficient Classification for Metric Data

Lee-Ad Gottlieb, Aryeh Kontorovich, Robert Krauthgamer

PDF

TL;DR

This paper introduces an efficient classification algorithm for data in general metric spaces that leverages the data's doubling dimension, providing improved accuracy, computational efficiency, and theoretical generalization bounds.

Contribution

It develops a new algorithm for metric space classification based on approximate Lipschitz extension and nearest neighbor search, with guarantees on efficiency and generalization.

Findings

01

Algorithm depends on data's doubling dimension for efficiency.

02

Experimental results show superiority over some kernel methods.

03

Provides sharper risk bounds for nearest neighbor classifiers.

Abstract

Recent advances in large-margin classification of data residing in general metric spaces (rather than Hilbert spaces) enable classification under various natural metrics, such as string edit and earthmover distance. A general framework developed for this purpose by von Luxburg and Bousquet [JMLR, 2004] left open the questions of computational efficiency and of providing direct bounds on generalization error. We design a new algorithm for classification in general metric spaces, whose runtime and accuracy depend on the doubling dimension of the data points, and can thus achieve superior classification performance in many common scenarios. The algorithmic core of our approach is an approximate (rather than exact) solution to the classical problems of Lipschitz extension and of Nearest Neighbor Search. The algorithm's generalization performance is guaranteed via the fat-shattering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.