Do more observations bring more information in rare events?
Danyang Huang, Liyuan Wang, Liping Zhu

TL;DR
This paper reveals that in rare event independence testing, the number of rare events, not total observations, determines test power, leading to a new rescaling and subsampling method for efficiency.
Contribution
It introduces a novel rescaling of covariances and a boosted subsampling procedure that maintains power while reducing computational complexity in rare event analysis.
Findings
Test power depends on the number of rare events, not total sample size.
The boosted procedure achieves near full-data power with less computation.
Theoretical analysis confirms the effectiveness of the proposed methods.
Abstract
It is generally believed that more observations provide more information. However, we observe that in the independence test for rare events, the power of the test is, surprisingly, determined by the number of rare events rather than the total sample size. Moreover, the correlations tend to shrink to zero even as the total sample size increases, as long as the proportion of rare events decreases. We demonstrate this phenomenon in both fixed and high-dimensional settings. To address these issues, we first rescale the covariances to account for the presence of rare events. We then propose a boosted procedure that uses only a small subset of non-rare events, yet achieves nearly the same power as using the full set of observations. As a result, computational complexity is significantly reduced. The theoretical properties, including asymptotic distribution and local power analysis, are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Seismology and Earthquake Studies
