Distributed Asymmetric Allocation: A Topic Model for Large Imbalanced Corpora in Social Sciences

Kohei Watanabe

arXiv:2512.18119·stat.ME·December 23, 2025

Distributed Asymmetric Allocation: A Topic Model for Large Imbalanced Corpora in Social Sciences

Kohei Watanabe

PDF

Open Access

TL;DR

This paper introduces Distributed Asymmetric Allocation (DAA), a new topic model designed for large, imbalanced social science corpora, which improves speed and accuracy over traditional LDA by optimizing Dirichlet priors.

Contribution

The paper presents DAA, a novel topic model that integrates multiple algorithms to efficiently identify important topics in large, imbalanced datasets, outperforming LDA.

Findings

01

DAA classifies sentences more accurately than LDA.

02

DAA operates faster than traditional LDA.

03

Optimizing Dirichlet priors enhances content analysis accuracy.

Abstract

Social scientists employ latent Dirichlet allocation (LDA) to find highly specific topics in large corpora, but they often struggle in this task because (1) LDA, in general, takes a significant amount of time to fit on large corpora; (2) unsupervised LDA fragments topics into sub-topics in short documents; (3) semi-supervised LDA fails to identify specific topics defined using seed words. To solve these problems, I have developed a new topic model called distributed asymmetric allocation (DAA) that integrates multiple algorithms for efficiently identifying sentences about important topics in large corpora. I evaluate the ability of DAA to identify politically important topics by fitting it to the transcripts of speeches at the United Nations General Assembly between 1991 and 2017. The results show that DAA can classify sentences significantly more accurately and quickly than LDA thanks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Sentiment Analysis and Opinion Mining · Topic Modeling