Social Distancing is Good for Points too!
Alejandro Flores-Velazco

TL;DR
This paper investigates the FCNN algorithm for nearest-neighbor condensation, demonstrating its potential inefficiencies and proposing modifications to establish theoretical bounds and guarantees for its subset selection.
Contribution
The paper provides the first theoretical analysis of FCNN, identifies its limitations with closely spaced points, and introduces modifications to ensure better bounds and approximation guarantees.
Findings
FCNN can select many more points when data points are very close.
Modified FCNN avoids selecting too many points by enforcing distance constraints.
Theoretical upper-bounds and approximation guarantees are established for the modified algorithm.
Abstract
The nearest-neighbor rule is a well-known classification technique that, given a training set P of labeled points, classifies any unlabeled query point with the label of its closest point in P. The nearest-neighbor condensation problem aims to reduce the training set without harming the accuracy of the nearest-neighbor rule. FCNN is the most popular algorithm for condensation. It is heuristic in nature, and theoretical results for it are scarce. In this paper, we settle the question of whether reasonable upper-bounds can be proven for the size of the subset selected by FCNN. First, we show that the algorithm can behave poorly when points are too close to each other, forcing it to select many more points than necessary. We then successfully modify the algorithm to avoid such cases, thus imposing that selected points should "keep some distance". This modification is sufficient to prove…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Advanced Image and Video Retrieval Techniques
