Hypothesis testing for high-dimensional sparse binary regression
Rajarshi Mukherjee, Natesh S. Pillai, Xihong Lin

TL;DR
This paper investigates the limits of hypothesis testing in high-dimensional sparse binary regression models, revealing new phenomena related to design matrix sparsity and proposing optimal testing procedures.
Contribution
It introduces the concept of a design matrix sparsity index affecting detection boundaries and develops rate-optimal tests for different sparsity regimes.
Findings
Detection boundary depends on design matrix sparsity and signal strength.
Generalized likelihood ratio test is optimal in dense regimes.
Extended Higher Criticism Test is optimal in sparse regimes.
Abstract
In this paper, we study the detection boundary for minimax hypothesis testing in the context of high-dimensional, sparse binary regression models. Motivated by genetic sequencing association studies for rare variant effects, we investigate the complexity of the hypothesis testing problem when the design matrix is sparse. We observe a new phenomenon in the behavior of detection boundary which does not occur in the case of Gaussian linear regression. We derive the detection boundary as a function of two components: a design matrix sparsity index and signal strength, each of which is a function of the sparsity of the alternative. For any alternative, if the design matrix sparsity index is too high, any test is asymptotically powerless irrespective of the magnitude of signal strength. For binary design matrices with the sparsity index that is not too high, our results are parallel to those…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
