Loglinear modelling of huge contingency tables
Veronica Vinciotti, Ernst C. Wit

TL;DR
This paper introduces an efficient method for inferring higher-order loglinear models in extremely large contingency tables by sampling empty cells and using a Poisson likelihood framework, enabling analysis of massive categorical datasets.
Contribution
The paper presents a novel approach combining sampling and IRWLS for scalable loglinear modeling of huge contingency tables, addressing computational challenges in high-dimensional categorical data.
Findings
Method successfully applied to 70-dimensional social survey data.
Achieves convergence with a sample of empty cells exceeding observations.
Handles contingency tables with trillions of cells efficiently.
Abstract
Contingency tables are a fundamental representation of multivariate categorical data. As the size of the contingency table grows exponentially with the number of variables, even a moderate number of variables, each with a moderate number of levels, will result in a huge number of cells, the majority of which will remain empty even with a significant amount of data. We propose an efficient method for inferring higher-order loglinear models in such scenarios. We tackle the computational challenge by using only a sample of the empty cells and deriving the associated likelihood under a Poisson sampling scheme. This allows us to define an iteratively re-weighted least squares (IRWLS) algorithm for parameter estimation. Under the extreme setting of huge contingency tables, we show how standard Poisson regression on the sampled data converges to this IRWLS scheme, when the number of sampled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Data-Driven Disease Surveillance · HIV, Drug Use, Sexual Risk
