Distribution-free Detection of a Submatrix
Ery Arias-Castro, Yuchao Liu

TL;DR
This paper introduces a distribution-free method for detecting a submatrix with elevated values in large data matrices, using permutation calibration and rank-based variants, achieving comparable asymptotic performance to parametric tests.
Contribution
It demonstrates that permutation calibration achieves asymptotic performance similar to parametric methods in submatrix detection, and analyzes rank-based variants for nonparametric detection.
Findings
Permutation calibration matches parametric test performance asymptotically
Rank-based methods have quantifiable power loss
Method applicable in distribution-free settings
Abstract
We consider the problem of detecting the presence of a submatrix with larger-than-usual values in a large data matrix. This problem was considered in (Butucea and Ingster, 2013) under a one-parameter exponential family, and one of the test they analyzed is the scan test. Taking a nonparametric stance, we show that a calibration by permutation leads to the same (first-order) asymptotic performance. This is true for the two types of permutations we consider. We also study the corresponding rank-based variants and precisely quantify the loss in asymptotic power.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Bayesian Methods and Mixture Models · Statistical Methods and Inference
