Mastering Rare Event Analysis: Optimal Subsample Size in Logistic and Cox Regressions
Tal Agassi, Nir Keret, Malka Gorfine

TL;DR
This paper develops methods to determine the optimal subsample size for logistic and Cox regression models, especially in rare event and imbalanced data scenarios, improving computational efficiency and analysis accuracy.
Contribution
It introduces new tools and procedures for selecting optimal subsample sizes in Cox and logistic regressions, addressing a key gap in existing subsampling methodologies.
Findings
Tools effectively select optimal subsample sizes in simulations
Procedures improve analysis accuracy in large datasets
Demonstrated on real-world datasets with rare events
Abstract
In the realm of contemporary data analysis, the use of massive datasets has taken on heightened significance, albeit often entailing considerable demands on computational time and memory. While a multitude of existing works offer optimal subsampling methods for conducting analyses on subsamples with minimized efficiency loss, they notably lack tools for judiciously selecting the optimal subsample size. To bridge this gap, our work introduces tools designed for choosing the optimal subsample size. We focus on three settings: the Cox regression model for survival data with rare events and logistic regression for both balanced and imbalanced datasets. Additionally, we present a novel optimal subsampling procedure tailored for logistic regression with imbalanced data. The efficacy of these tools and procedures is demonstrated through an extensive simulation study and meticulous analyses of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProbability and Risk Models · Statistical Methods and Inference
