A Maximal Heterogeneity Based Clustering Approach for Obtaining Samples
Megha Mishra, Chandrasekaran Anirudh Bhardwaj, and Kalyani Desikan

TL;DR
This paper introduces the Wobbly Center Algorithm, a novel clustering-based sampling method that maximizes heterogeneity within samples, providing more consistent and representative samples for medical and social science applications.
Contribution
The paper presents a new non-statistical, no-replacement sampling technique that builds clusters by maximizing internal heterogeneity, outperforming existing sampling methods.
Findings
Wobbly Center Algorithm produces more consistent samples.
Statistical validation shows samples are representative.
Algorithm outperforms other sampling techniques on benchmark datasets.
Abstract
Medical and social sciences demand sampling techniques which are robust, reliable, replicable and have the least dissimilarity between the samples obtained. Majority of the applications of sampling use randomized sampling, albeit with stratification where applicable. The randomized technique is not consistent, and may provide different samples each time, and the different samples themselves may not be similar to each other. In this paper, we introduce a novel non-statistical no-replacement sampling technique called Wobbly Center Algorithm, which relies on building clusters iteratively based on maximizing the heterogeneity inside each cluster. The algorithm works on the principle of stepwise building of clusters by finding the points with the maximal distance from the cluster center. The obtained results are validated statistically using Analysis of Variance tests by comparing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiometric Identification and Security · Face and Expression Recognition · Bayesian Methods and Mixture Models
