Maximum Likelihood Estimation in Latent Class Models For Contingency Table Data
S.E. Fienberg, P. Hersh, A. Rinaldo, Y. Zhou

TL;DR
This paper explores the geometric structure of latent class models for categorical data, analyzing maximum likelihood estimation challenges like non-identifiability and symmetry effects through theoretical and empirical examples.
Contribution
It provides a geometric perspective on latent class models, clarifies issues in maximum likelihood estimation, and introduces the '100 Swiss Francs' problem as a motivating example.
Findings
Identifies causes of non-identifiability in latent class models
Illustrates the impact of symmetric data on estimation
Analyzes the model's geometric structure and estimation difficulties
Abstract
Statistical models with latent structure have a history going back to the 1950s and have seen widespread use in the social sciences and, more recently, in computational biology and in machine learning. Here we study the basic latent class model proposed originally by the sociologist Paul F. Lazarfeld for categorical variables, and we explain its geometric structure. We draw parallels between the statistical and geometric properties of latent class models and we illustrate geometrically the causes of many problems associated with maximum likelihood estimation and related statistical inference. In particular, we focus on issues of non-identifiability and determination of the model dimension, of maximization of the likelihood function and on the effect of symmetric data. We illustrate these phenomena with a variety of synthetic and real-life tables, of different dimension and complexity.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCensus and Population Estimation · Bayesian Methods and Mixture Models · Data-Driven Disease Surveillance
