Do more observations bring more information in rare events?

Danyang Huang; Liyuan Wang; Liping Zhu

arXiv:2506.13671·stat.ME·June 17, 2025

Do more observations bring more information in rare events?

Danyang Huang, Liyuan Wang, Liping Zhu

PDF

Open Access

TL;DR

This paper reveals that in rare event independence testing, the number of rare events, not total observations, determines test power, leading to a new rescaling and subsampling method for efficiency.

Contribution

It introduces a novel rescaling of covariances and a boosted subsampling procedure that maintains power while reducing computational complexity in rare event analysis.

Findings

01

Test power depends on the number of rare events, not total sample size.

02

The boosted procedure achieves near full-data power with less computation.

03

Theoretical analysis confirms the effectiveness of the proposed methods.

Abstract

It is generally believed that more observations provide more information. However, we observe that in the independence test for rare events, the power of the test is, surprisingly, determined by the number of rare events rather than the total sample size. Moreover, the correlations tend to shrink to zero even as the total sample size increases, as long as the proportion of rare events decreases. We demonstrate this phenomenon in both fixed and high-dimensional settings. To address these issues, we first rescale the covariances to account for the presence of rare events. We then propose a boosted procedure that uses only a small subset of non-rare events, yet achieves nearly the same power as using the full set of observations. As a result, computational complexity is significantly reduced. The theoretical properties, including asymptotic distribution and local power analysis, are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Seismology and Earthquake Studies