Spectral Clustering with Likelihood Refinement for High-dimensional Latent Class Recovery
Zhongyuan Lyu, Yuqi Gu

TL;DR
This paper introduces a two-stage spectral clustering and likelihood refinement algorithm for high-dimensional latent class recovery, achieving theoretical optimality and superior empirical performance.
Contribution
It presents a novel, computationally efficient method combining spectral clustering with likelihood refinement, with proven minimax optimality for high-dimensional latent class recovery.
Findings
The method achieves exact clustering with high probability.
It outperforms existing methods in simulations and real data.
Provides a consistent estimator for the number of latent classes.
Abstract
Latent class models are widely used for identifying unobserved subgroups from multivariate categorical data in social sciences, with binary data as a particularly popular example. However, accurately recovering individual latent class memberships remains challenging, especially when handling high-dimensional datasets with many items. This work proposes a novel two-stage algorithm for latent class models suited for high-dimensional binary responses. Our method first initializes latent class assignments by an easy-to-implement spectral clustering algorithm, and then refines these assignments with a one-step likelihood-based update. This approach combines the computational efficiency of spectral clustering with the improved statistical accuracy of likelihood-based estimation. We establish theoretical guarantees showing that this method is minimax-optimal for latent class recovery in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
