LLM-XTM: Enhancing Cross-Lingual Topic Models with Large Language Models

Minh Chu Xuan; Tien-Phat Nguyen; Linh Ngo Van; Dinh Viet Sang; Nguyen Thi Ngoc Diep; Trung Le

arXiv:2605.03299·cs.CL·May 6, 2026

LLM-XTM: Enhancing Cross-Lingual Topic Models with Large Language Models

Minh Chu Xuan, Tien-Phat Nguyen, Linh Ngo Van, Dinh Viet Sang, Nguyen Thi Ngoc Diep, Trung Le

PDF

TL;DR

LLM-XTM is a novel framework that improves cross-lingual topic models by integrating large language models for refinement, achieving better coherence and alignment with less resource dependence.

Contribution

It introduces a black-box, scalable method combining LLM-guided refinement with uncertainty quantification for cross-lingual topic modeling.

Findings

01

Achieves superior topic coherence and alignment on multilingual corpora.

02

Reduces reliance on bilingual dictionaries and costly LLM calls.

03

Provides a stable, scalable enhancement framework for cross-lingual topics.

Abstract

Cross-lingual topic modeling aims to discover shared semantic structures across languages, yet existing models depend on sparse bilingual resources and often yield incoherent or weakly aligned topics. Recent LLM-based refinements improve interpretability but are costly, document-level, and prone to hallucination, with prior white-box approaches requiring inaccessible token probabilities. We propose LLM-XTM, a framework that integrates LLM-guided topic refinement with self-consistency uncertainty quantification, enabling black-box, stable, and scalable enhancement of cross-lingual topic models. Experiments on multilingual corpora show that LLM-XTM achieves superior topic coherence and alignment while reducing reliance on bilingual dictionaries and expensive LLM calls.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.