A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations
Ziyi Yang, Yinfei Yang, Daniel Cer, Eric Darve

TL;DR
This paper introduces a simple, post-training linear method called LIR to remove language bias from multilingual models, significantly improving cross-lingual tasks by isolating semantic content from language identity.
Contribution
The paper proposes a novel, model-agnostic linear approach using geometric algebra to eliminate language bias in multilingual representations, enhancing cross-lingual transfer.
Findings
LIR achieves nearly 100% relative improvement in MAP on LAReQA for weak-alignment models.
Removing language information improves cross-lingual transfer performance.
LIR is simple, effective, and applicable post-training without model modifications.
Abstract
Language agnostic and semantic-language information isolation is an emerging research direction for multilingual representations models. We explore this problem from a novel angle of geometric algebra and semantic space. A simple but highly effective method "Language Information Removal (LIR)" factors out language identity information from semantic related components in multilingual representations pre-trained on multi-monolingual data. A post-training and model-agnostic method, LIR only uses simple linear operations, e.g. matrix factorization and orthogonal projection. LIR reveals that for weak-alignment multilingual systems, the principal components of semantic spaces primarily encodes language identity information. We first evaluate the LIR on a cross-lingual question answer retrieval task (LAReQA), which requires the strong alignment for the multilingual embedding space. Experiment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Sentiment Analysis and Opinion Mining
