Refining Dimensions for Improving Clustering-based Cross-lingual Topic Models
Chia-Hsuan Chang, Tien-Yuan Huang, Yi-Hang Tsai, Chia-Ming Chang,, San-Yih Hwang

TL;DR
This paper introduces a SVD-based dimension refinement method to improve cross-lingual topic models by neutralizing language-dependent dimensions, resulting in better multi-language topic identification.
Contribution
The paper proposes a novel SVD-based dimension refinement component that enhances clustering-based cross-lingual topic models by addressing language-dependent dimensions.
Findings
Outperforms state-of-the-art cross-lingual topic models on three datasets
Effectively neutralizes language-dependent dimensions in multilingual representations
Improves accuracy of cross-lingual topic identification
Abstract
Recent works in clustering-based topic models perform well in monolingual topic identification by introducing a pipeline to cluster the contextualized representations. However, the pipeline is suboptimal in identifying topics across languages due to the presence of language-dependent dimensions (LDDs) generated by multilingual language models. To address this issue, we introduce a novel, SVD-based dimension refinement component into the pipeline of the clustering-based topic model. This component effectively neutralizes the negative impact of LDDs, enabling the model to accurately identify topics across languages. Our experiments on three datasets demonstrate that the updated pipeline with the dimension refinement component generally outperforms other state-of-the-art cross-lingual topic models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Expert finding and Q&A systems · Computational and Text Analysis Methods
