Topic Modeling for Free-Response Text Data from a Complex Survey

Namitha V.Pais; Scott H. Holan; Paul A. Parker

arXiv:2501.13777·stat.AP·January 24, 2025

Topic Modeling for Free-Response Text Data from a Complex Survey

Namitha V.Pais, Scott H. Holan, Paul A. Parker

PDF

Open Access

TL;DR

This paper enhances topic modeling for complex survey data by incorporating survey weights into the Mixture of Unigrams model, enabling unbiased extraction of themes from open-ended survey responses.

Contribution

It introduces a pseudolikelihood approach for the Mixture of Unigrams model under informative sampling and develops a hierarchical version accounting for respondent-level factors.

Findings

01

The weighted MoU effectively extracts meaningful topics from survey data.

02

The hierarchical MoU captures variations in topic proportions across respondent characteristics.

03

Application to ANES data demonstrates improved interpretability of survey responses.

Abstract

Topic Modeling is a popular statistical tool commonly used on textual data to identify the hidden thematic structure in a document collection based on the distribution of words. Additionally, it can be used to cluster the documents, with clusters representing distinct topics. The Mixture of Unigrams (MoU) is a standard topic model for clustering document-term data and can be particularly useful for analyzing open-ended survey responses to extract meaningful information from the underlying topics. However, with complex survey designs, where data is often collected on individual (document) characteristics, it is essential to account for the sample design in order to avoid biased estimates. To address this issue, we propose the MoU model under informative sampling using a pseudolikelihood to account for the sample design in the model by incorporating survey weights. We evaluate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods