Distributed Nearest Neighbor Classification
Jiexin Duan, Xingye Qiao, Guang Cheng

TL;DR
This paper develops a distributed nearest neighbor classification framework that maintains optimal convergence rates in big data settings, using voting schemes and theoretical bounds to improve scalability and performance.
Contribution
It introduces a distributed nearest neighbor classifier with weighted voting, providing theoretical bounds and demonstrating its effectiveness over traditional methods.
Findings
Weighted voting improves scalability with more subsamples.
Theoretical bounds ensure optimal convergence rates.
Numerical studies confirm practical effectiveness.
Abstract
Nearest neighbor is a popular nonparametric method for classification and regression with many appealing properties. In the big data era, the sheer volume and spatial/temporal disparity of big data may prohibit centrally processing and storing the data. This has imposed considerable hurdle for nearest neighbor predictions since the entire training data must be memorized. One effective way to overcome this issue is the distributed learning framework. Through majority voting, the distributed nearest neighbor classifier achieves the same rate of convergence as its oracle version in terms of both the regret and instability, up to a multiplicative constant that depends solely on the data dimension. The multiplicative difference can be eliminated by replacing majority voting with the weighted voting scheme. In addition, we provide sharp theoretical upper bounds of the number of subsamples in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Statistical Methods and Inference · Machine Learning and Data Classification
