Logistic regression models for aggregated data

Tom Whitaker; Boris Beranger; Scott A. Sisson

arXiv:1912.03805·stat.ME·August 25, 2020·J. Comput. Graph. Stat.

Logistic regression models for aggregated data

Tom Whitaker, Boris Beranger, Scott A. Sisson

PDF

TL;DR

This paper introduces a histogram-based approximation method for logistic regression that reduces computational costs while maintaining classification accuracy on large datasets.

Contribution

It develops a novel composite likelihood approach using histograms to enable efficient inference for large-scale logistic regression models.

Findings

01

Comparable classification accuracy to full data analysis

02

Significantly lower computational cost

03

Effective on large real-world datasets

Abstract

Logistic regression models are a popular and effective method to predict the probability of categorical response data. However inference for these models can become computationally prohibitive for large datasets. Here we adapt ideas from symbolic data analysis to summarise the collection of predictor variables into histogram form, and perform inference on this summary dataset. We develop ideas based on composite likelihoods to derive an efficient one-versus-rest approximate composite likelihood model for histogram-based random variables, constructed from low-dimensional marginal histograms obtained from the full histogram. We demonstrate that this procedure can achieve comparable classification rates compared to the standard full data multinomial analysis and against state-of-the-art subsampling algorithms for logistic regression, but at a substantially lower computational cost.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.