Optimal Test-Data Piling in HDLSS Classification with Covariance Heterogeneity
Taehyun Kim, Jeongyoun Ahn, Sungkyu Jung

TL;DR
This paper investigates data piling phenomena in high-dimensional classification with heterogeneous covariance, identifying an optimal direction for perfect separation and proposing an algorithm to find it using training data.
Contribution
It extends the understanding of data piling to heterogeneous covariance structures and introduces a method to compute the optimal piling direction in high-dimensional settings.
Findings
Optimal direction maximizes class separation in data piling.
Imbalance of tail eigenvalues is the main obstacle to finding the optimal direction.
Proposed algorithm effectively identifies the optimal direction in simulations.
Abstract
This work addresses a longstanding question in high-dimensional linear classification: Is perfect classification achievable in heterogeneous covariance structures? We focus on the phenomenon of data piling, where projected data points collapse onto discrete values. We provide a comprehensive characterization of two distinct types of data piling. The first type of data piling refers to the phenomenon where projecting the training data onto a certain direction yields exactly two distinct values-one for each class. This occurs universally when the data dimension exceeds the sample size . The second type concerns independent test data and arises asymptotically as with fixed . While previous work established the existence of such double data piling under homogeneously spiked covariance structures using negatively ridged classifiers, our analysis extends to the more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Statistical Methods and Bayesian Inference
