Refining Dimensions for Improving Clustering-based Cross-lingual Topic   Models

Chia-Hsuan Chang; Tien-Yuan Huang; Yi-Hang Tsai; Chia-Ming Chang,; San-Yih Hwang

arXiv:2412.12433·cs.CL·December 18, 2024

Refining Dimensions for Improving Clustering-based Cross-lingual Topic Models

Chia-Hsuan Chang, Tien-Yuan Huang, Yi-Hang Tsai, Chia-Ming Chang,, San-Yih Hwang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a SVD-based dimension refinement method to improve cross-lingual topic models by neutralizing language-dependent dimensions, resulting in better multi-language topic identification.

Contribution

The paper proposes a novel SVD-based dimension refinement component that enhances clustering-based cross-lingual topic models by addressing language-dependent dimensions.

Findings

01

Outperforms state-of-the-art cross-lingual topic models on three datasets

02

Effectively neutralizes language-dependent dimensions in multilingual representations

03

Improves accuracy of cross-lingual topic identification

Abstract

Recent works in clustering-based topic models perform well in monolingual topic identification by introducing a pipeline to cluster the contextualized representations. However, the pipeline is suboptimal in identifying topics across languages due to the presence of language-dependent dimensions (LDDs) generated by multilingual language models. To address this issue, we introduce a novel, SVD-based dimension refinement component into the pipeline of the clustering-based topic model. This component effectively neutralizes the negative impact of LDDs, enabling the model to accurately identify topics across languages. Our experiments on three datasets demonstrate that the updated pipeline with the dimension refinement component generally outperforms other state-of-the-art cross-lingual topic models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Text-Analytics-and-Retrieval/Clustering-based-Cross-Lingual-Topic-Model
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Network Analysis Techniques · Expert finding and Q&A systems · Computational and Text Analysis Methods