Logistic regression models for aggregated data
Tom Whitaker, Boris Beranger, Scott A. Sisson

TL;DR
This paper introduces a histogram-based approximation method for logistic regression that reduces computational costs while maintaining classification accuracy on large datasets.
Contribution
It develops a novel composite likelihood approach using histograms to enable efficient inference for large-scale logistic regression models.
Findings
Comparable classification accuracy to full data analysis
Significantly lower computational cost
Effective on large real-world datasets
Abstract
Logistic regression models are a popular and effective method to predict the probability of categorical response data. However inference for these models can become computationally prohibitive for large datasets. Here we adapt ideas from symbolic data analysis to summarise the collection of predictor variables into histogram form, and perform inference on this summary dataset. We develop ideas based on composite likelihoods to derive an efficient one-versus-rest approximate composite likelihood model for histogram-based random variables, constructed from low-dimensional marginal histograms obtained from the full histogram. We demonstrate that this procedure can achieve comparable classification rates compared to the standard full data multinomial analysis and against state-of-the-art subsampling algorithms for logistic regression, but at a substantially lower computational cost.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
