Asymptotic Inference for Infinitely Imbalanced Logistic Regression
Dorian Goldman, Bo Zhang

TL;DR
This paper develops a second order asymptotic expansion for logistic regression with an infinitely imbalanced dataset, revealing that the limiting distribution depends only on the minority class mean and providing explicit variance formulas.
Contribution
It extends Owen's (2007) work by deriving a second order expansion for the slope parameter in highly imbalanced logistic regression, with explicit variance calculations.
Findings
Second order term converges to a normal distribution.
Variance depends only on the minority class mean.
Results confirmed by Monte Carlo simulations.
Abstract
In this paper we extend the work of Owen (2007) by deriving a second order expansion for the slope parameter in logistic regression, when the size of the majority class is unbounded and the minority class is finite. More precisely, we demonstrate that the second order term converges to a normal distribution and explicitly compute its variance, which surprisingly once again depends only on the mean of the minority class points and not their arrangement under mild regularity assumptions. In the case that the majority class is normally distributed, we illustrate that the variance of the the limiting slope depends exponentially on the z-score of the average of the minority class's points with respect to the majority class's distribution. We confirm our results by Monte Carlo simulations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Survey Sampling and Estimation Techniques · Advanced Statistical Methods and Models
