On Generalizations of Some Distance Based Classifiers for HDLSS Data
Sarbojit Roy, Soham Sarkar, Subhajit Dutta, Anil K. Ghosh

TL;DR
This paper introduces new transformations and a grouping-based generalization of distance-based classifiers to improve classification accuracy in high-dimensional, low sample size data, especially when populations differ beyond location and scale.
Contribution
It proposes simple transformations and a variable grouping approach for distance classifiers, extending their applicability beyond location and scale differences in HDLSS data.
Findings
Proposed classifiers outperform existing methods in simulations.
Theoretical analysis confirms high-dimensional consistency.
Real data experiments demonstrate practical advantages.
Abstract
In high dimension, low sample size (HDLSS) settings, classifiers based on Euclidean distances like the nearest neighbor classifier and the average distance classifier perform quite poorly if differences between locations of the underlying populations get masked by scale differences. To rectify this problem, several modifications of these classifiers have been proposed in the literature. However, existing methods are confined to location and scale differences only, and often fail to discriminate among populations differing outside of the first two moments. In this article, we propose some simple transformations of these classifiers resulting into improved performance even when the underlying populations have the same location and scale. We further propose a generalization of these classifiers based on the idea of grouping of variables. The high-dimensional behavior of the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Statistical Methods and Inference · Gene expression and cancer classification
