Interactive Topic Models with Optimal Transport
Garima Dhanania, Sheshera Mysore, Chau Minh Pham, Mohit Iyyer, Hamed, Zamani, Andrew McCallum

TL;DR
This paper introduces EdTM, a novel supervised topic modeling approach that uses optimal transport to incorporate analyst knowledge and feedback, improving coherence and robustness over traditional models.
Contribution
EdTM models topic assignment as an optimal transport problem, enabling integration of analyst input and leveraging LLM-based affinities for improved topic coherence.
Findings
EdTM outperforms few-shot LLM classifiers and traditional clustering-based topic models.
The framework effectively incorporates analyst feedback and remains robust to noisy inputs.
Experimental results demonstrate improved coherence and flexibility of EdTM.
Abstract
Topic models are widely used to analyze document collections. While they are valuable for discovering latent topics in a corpus when analysts are unfamiliar with the corpus, analysts also commonly start with an understanding of the content present in a corpus. This may be through categories obtained from an initial pass over the corpus or a desire to analyze the corpus through a predefined set of categories derived from a high level theoretical framework (e.g. political ideology). In these scenarios analysts desire a topic modeling approach which incorporates their understanding of the corpus while supporting various forms of interaction with the model. In this work, we present EdTM, as an approach for label name supervised topic modeling. EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities and using optimal transport for making…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Complex Network Analysis Techniques · Advanced Text Analysis Techniques
MethodsSparse Evolutionary Training · Linear Discriminant Analysis
