Rare and Weak effects in Large-Scale Inference: methods and phase diagrams
Jiashun Jin, Tracy Ke

TL;DR
This paper introduces a theoretical framework for analyzing rare and weak effects in large-scale data, using phase diagrams to visualize the limits of detection and selection, and demonstrates the optimality of Higher Criticism and Graphlet Screening methods.
Contribution
It develops an asymptotic model and phase diagram approach to evaluate and compare methods for rare/weak signal detection and variable selection, establishing their optimality.
Findings
HC and GS achieve optimal phase diagrams in ARW settings
Phase diagrams visualize detectability thresholds for rare/weak effects
HC and GS outperform traditional methods in challenging regimes
Abstract
Often when we deal with `Big Data', the true effects we are interested in are Rare and Weak (RW). Researchers measure a large number of features, hoping to find perhaps only a small fraction of them to be relevant to the research in question; the effect sizes of the relevant features are individually small so the true effects are not strong enough to stand out for themselves. Higher Criticism (HC) and Graphlet Screening (GS) are two classes of methods that are specifically designed for the Rare/Weak settings. HC was introduced to determine whether there are any relevant effects in all the measured features. More recently, HC was applied to classification, where it provides a method for selecting useful predictive features for trained classification rules. GS was introduced as a graph-guided multivariate screening procedure, and was used for variable selection. We develop a theoretic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Machine Learning in Materials Science · Advanced Text Analysis Techniques
