LLM-XTM: Enhancing Cross-Lingual Topic Models with Large Language Models
Minh Chu Xuan, Tien-Phat Nguyen, Linh Ngo Van, Dinh Viet Sang, Nguyen Thi Ngoc Diep, Trung Le

TL;DR
LLM-XTM is a novel framework that improves cross-lingual topic models by integrating large language models for refinement, achieving better coherence and alignment with less resource dependence.
Contribution
It introduces a black-box, scalable method combining LLM-guided refinement with uncertainty quantification for cross-lingual topic modeling.
Findings
Achieves superior topic coherence and alignment on multilingual corpora.
Reduces reliance on bilingual dictionaries and costly LLM calls.
Provides a stable, scalable enhancement framework for cross-lingual topics.
Abstract
Cross-lingual topic modeling aims to discover shared semantic structures across languages, yet existing models depend on sparse bilingual resources and often yield incoherent or weakly aligned topics. Recent LLM-based refinements improve interpretability but are costly, document-level, and prone to hallucination, with prior white-box approaches requiring inaccessible token probabilities. We propose LLM-XTM, a framework that integrates LLM-guided topic refinement with self-consistency uncertainty quantification, enabling black-box, stable, and scalable enhancement of cross-lingual topic models. Experiments on multilingual corpora show that LLM-XTM achieves superior topic coherence and alignment while reducing reliance on bilingual dictionaries and expensive LLM calls.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
