GloCTM: Cross-Lingual Topic Modeling via a Global Context Space
Nguyen Tien Phat, Ngo Vu Minh, Linh Van Ngo, Nguyen Thi Ngoc Diep, Thien Huu Nguyen

TL;DR
GloCTM introduces a unified semantic framework for cross-lingual topic modeling, leveraging multilingual embeddings and enriched representations to improve coherence and alignment across languages.
Contribution
It proposes a novel global context space that aligns topics across languages using enriched inputs, dual encoders, and a CKA loss, advancing beyond previous disjoint models.
Findings
Significantly improves topic coherence across languages.
Achieves better cross-lingual alignment than baseline models.
Demonstrates effectiveness on multiple benchmark datasets.
Abstract
Cross-lingual topic modeling seeks to uncover coherent and semantically aligned topics across languages - a task central to multilingual understanding. Yet most existing models learn topics in disjoint, language-specific spaces and rely on alignment mechanisms (e.g., bilingual dictionaries) that often fail to capture deep cross-lingual semantics, resulting in loosely connected topic spaces. Moreover, these approaches often overlook the rich semantic signals embedded in multilingual pretrained representations, further limiting their ability to capture fine-grained alignment. We introduce GloCTM (Global Context Space for Cross-Lingual Topic Model), a novel framework that enforces cross-lingual topic alignment through a unified semantic space spanning the entire model pipeline. GloCTM constructs enriched input representations by expanding bag-of-words with cross-lingual lexical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Sentiment Analysis and Opinion Mining
