Distributed Nearest Neighbor Classification

Jiexin Duan; Xingye Qiao; Guang Cheng

arXiv:1812.05005·math.ST·December 13, 2018·1 cites

Distributed Nearest Neighbor Classification

Jiexin Duan, Xingye Qiao, Guang Cheng

PDF

Open Access

TL;DR

This paper develops a distributed nearest neighbor classification framework that maintains optimal convergence rates in big data settings, using voting schemes and theoretical bounds to improve scalability and performance.

Contribution

It introduces a distributed nearest neighbor classifier with weighted voting, providing theoretical bounds and demonstrating its effectiveness over traditional methods.

Findings

01

Weighted voting improves scalability with more subsamples.

02

Theoretical bounds ensure optimal convergence rates.

03

Numerical studies confirm practical effectiveness.

Abstract

Nearest neighbor is a popular nonparametric method for classification and regression with many appealing properties. In the big data era, the sheer volume and spatial/temporal disparity of big data may prohibit centrally processing and storing the data. This has imposed considerable hurdle for nearest neighbor predictions since the entire training data must be memorized. One effective way to overcome this issue is the distributed learning framework. Through majority voting, the distributed nearest neighbor classifier achieves the same rate of convergence as its oracle version in terms of both the regret and instability, up to a multiplicative constant that depends solely on the data dimension. The multiplicative difference can be eliminated by replacing majority voting with the weighted voting scheme. In addition, we provide sharp theoretical upper bounds of the number of subsamples in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Statistical Methods and Inference · Machine Learning and Data Classification