Sequential category aggregation and partitioning approaches for multi-way contingency tables based on survey and census data
L. Fraser Jackson, Alistair G. Gray, Stephen E. Fienberg

TL;DR
This paper introduces a systematic approach for aggregating and partitioning large multi-way contingency tables from survey and census data, simplifying complex data while preserving key interaction structures.
Contribution
It proposes a new method based on restricted log-linear models that reduces categories in large tables, facilitating data summarization and analysis.
Findings
Method effectively reduces categories while preserving interaction structure.
Applicable to large tables with millions of cells.
Provides a flexible data summarization tool for various disciplines.
Abstract
Large contingency tables arise in many contexts but especially in the collection of survey and census data by government statistical agencies. Because the vast majority of the variables in this context have a large number of categories, agencies and users need a systematic way of constructing tables which are summaries of such contingency tables. We propose such an approach in this paper by finding members of a class of restricted log-linear models which maximize the likelihood of the data and use this to find a parsimonious means of representing the table. In contrast with more standard approaches for model search in hierarchical log-linear models (HLLM), our procedure systematically reduces the number of categories of the variables. Through a series of examples, we illustrate the extent to which it can preserve the interaction structure found with HLLMs and be used as a data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
