Significance Analysis of High-Dimensional, Low-Sample Size Partially Labeled Data
Qiyi Lu, Xingye Qiao

TL;DR
This paper introduces a significance testing method for high-dimensional, low-sample size data with partial labels, improving the detection of class differences by utilizing all available data more effectively.
Contribution
It proposes a novel significance analysis approach for partially labeled high-dimensional data, enhancing power while maintaining size, with theoretical and empirical validation.
Findings
The method outperforms traditional tests ignoring label information in simulations.
Theoretical analysis confirms the method's validity in high-dimensional, low-sample size settings.
Real data example demonstrates practical usefulness of the approach.
Abstract
Classification and clustering are both important topics in statistical learning. A natural question herein is whether predefined classes are really different from one another, or whether clusters are really there. Specifically, we may be interested in knowing whether the two classes defined by some class labels (when they are provided), or the two clusters tagged by a clustering algorithm (where class labels are not provided), are from the same underlying distribution. Although both are challenging questions for the high-dimensional, low-sample size data, there has been some recent development for both. However, when it is costly to manually place labels on observations, it is often that only a small portion of the class labels is available. In this article, we propose a significance analysis approach for such type of data, namely partially labeled data. Our method makes use of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Statistical Methods and Models · Statistical Methods and Inference
