Generalized Linear Models for Aggregated Data
Avradeep Bhowmik, Joydeep Ghosh, Oluwasanmi Koyejo

TL;DR
This paper introduces a method for fitting generalized linear models to aggregated data, such as histograms or order statistics, enabling accurate individual-level inference despite data aggregation.
Contribution
It proposes a novel algorithm that estimates model parameters from aggregated data by leveraging permutation testing and alternating imputation, extending to various histogram granularities.
Findings
Effective in predicting individual targets from coarse histograms when linear relationships exist.
The approach performs well on simulated and healthcare data, with diminishing returns as histogram granularity increases.
The method bridges the gap between aggregated data and individual-level inference in statistical modeling.
Abstract
Databases in domains such as healthcare are routinely released to the public in aggregated form. Unfortunately, naive modeling with aggregated data may significantly diminish the accuracy of inferences at the individual level. This paper addresses the scenario where features are provided at the individual level, but the target variables are only available as histogram aggregates or order statistics. We consider a limiting case of generalized linear modeling when the target variables are only known up to permutation, and explore how this relates to permutation testing; a standard technique for assessing statistical dependency. Based on this relationship, we propose a simple algorithm to estimate the model parameters and individual level inferences via alternating imputation and standard generalized linear model fitting. Our results suggest the effectiveness of the proposed approach when,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Data Management and Algorithms · Statistical Methods and Inference
