Geometrical Homogeneous Clustering for Image Data Reduction
Shril Mody, Janvi Thakkar, Devvrat Joshi, Siddharth Soni, Rohan Patil,, Nipun Batra

TL;DR
This paper introduces four novel variations of a homogeneous clustering algorithm to reduce dataset size for image classification, achieving high accuracy and significant data reduction on multiple datasets.
Contribution
The paper proposes four new clustering-based data reduction methods that improve dataset efficiency while maintaining high classification accuracy.
Findings
GHCIDR achieved up to 99.35% accuracy on MNIST.
Data reduction of up to 87.27% on MNIST.
The proposed methods outperform baseline in accuracy and data efficiency.
Abstract
In this paper, we present novel variations of an earlier approach called homogeneous clustering algorithm for reducing dataset size. The intuition behind the approaches proposed in this paper is to partition the dataset into homogeneous clusters and select some images which contribute significantly to the accuracy. Selected images are the proper subset of the training data and thus are human-readable. We propose four variations upon the baseline algorithm-RHC. The intuition behind the first approach, RHCKON, is that the boundary points contribute significantly towards the representation of clusters. It involves selecting k farthest and one nearest neighbour of the centroid of the clusters. In the following two approaches (KONCW and CWKC), we introduce the concept of cluster weights. They are based on the fact that larger clusters contribute more than smaller sized clusters. The final…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · AI in cancer detection · Advanced Image and Video Retrieval Techniques
