Population-Based Hierarchical Non-negative Matrix Factorization for Survey Data
Xiaofu Ding, Xinyu Dong, Olivia McGough, Chenxin Shen and, Annie Ulichney, Ruiyao Xu, William Swartworth, Jocelyn T. Chi and, Deanna Needell

TL;DR
This paper introduces PHNMF, a hierarchical non-negative matrix factorization method designed to uncover interpretable population structures in complex survey data, demonstrating high accuracy on synthetic and real datasets.
Contribution
The paper presents a novel PHNMF approach that automatically identifies hierarchical population structures from diverse survey data types, enhancing interpretability and downstream analysis.
Findings
PHNMF accurately recovers latent hierarchical structures.
The method reveals meaningful subpopulation groupings.
Improves downstream inference in survey data analysis.
Abstract
Motivated by the problem of identifying potential hierarchical population structure on modern survey data containing a wide range of complex data types, we introduce population-based hierarchical non-negative matrix factorization (PHNMF). PHNMF is a variant of hierarchical non-negative matrix factorization based on feature similarity. As such, it enables an automatic and interpretable approach for identifying and understanding hierarchical structure in a data matrix constructed from a wide range of data types. Our numerical experiments on synthetic and real survey data demonstrate that PHNMF can recover latent hierarchical population structure in complex data with high accuracy. Moreover, the recovered subpopulation structure is meaningful and can be useful for improving downstream inference.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Rural development and sustainability · Human Mobility and Location-Based Analysis
