Improving the accuracy of nearest-neighbor classification using principled construction and stochastic sampling of training-set centroids
Stephen Whitelam

TL;DR
This paper introduces a method to improve nearest-neighbor classification accuracy by using principled coarse graining and stochastic sampling of training-set centroids, effectively increasing configuration space coverage.
Contribution
The authors propose a novel approach combining coarse graining and stochastic sampling to enhance nearest-neighbor classification without losing accuracy.
Findings
Coarse graining reduces the number of training images without accuracy loss.
Stochastic sampling of centroids improves classification accuracy.
Method elevates nearest-neighbor classification to upper-ranking ML techniques.
Abstract
A conceptually simple way to classify images is to directly compare test-set data and training-set data. The accuracy of this approach is limited by the method of comparison used, and by the extent to which the training-set data cover configuration space. Here we show that this coverage can be substantially increased using coarse graining (replacing groups of images by their centroids) and stochastic sampling (using distinct sets of centroids in combination). We use the MNIST and Fashion-MNIST data sets to show that a principled coarse-graining algorithm can convert training images into fewer image centroids without loss of accuracy of classification of test-set images by nearest-neighbor classification. Distinct batches of centroids can be used in combination as a means of stochastically sampling configuration space, and can classify test-set data more accurately than can the unaltered…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Image Retrieval and Classification Techniques · Bayesian Methods and Mixture Models
