Efficient posterior sampling for high-dimensional imbalanced logistic regression
Deborshee Sen, Matthias Sachs, Jianfeng Lu, David Dunson

TL;DR
This paper introduces a novel importance-weighted and mini-batch sub-sampling approach for Bayesian high-dimensional imbalanced logistic regression, significantly improving efficiency and accuracy over existing methods.
Contribution
It generalizes piece-wise deterministic Markov chain Monte Carlo algorithms to include importance-weighted and mini-batch sub-sampling for imbalanced data.
Findings
Outperforms current competitors in efficiency and accuracy
Maintains correct stationary distribution with small sub-samples
Provides theoretical support and empirical validation
Abstract
High-dimensional data are routinely collected in many areas. We are particularly interested in Bayesian classification models in which one or more variables are imbalanced. Current Markov chain Monte Carlo algorithms for posterior computation are inefficient as and/or increase due to worsening time per step and mixing rates. One strategy is to use a gradient-based sampler to improve mixing while using data sub-samples to reduce per-step computational complexity. However, usual sub-sampling breaks down when applied to imbalanced data. Instead, we generalize piece-wise deterministic Markov chain Monte Carlo algorithms to include importance-weighted and mini-batch sub-sampling. These approaches maintain the correct stationary distribution with arbitrarily small sub-samples, and substantially outperform current competitors. We provide theoretical support and illustrate gains in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Markov Chains and Monte Carlo Methods · Statistical Methods and Inference
