A Simple and Effective Method To Eliminate the Self Language Bias in   Multilingual Representations

Ziyi Yang; Yinfei Yang; Daniel Cer; Eric Darve

arXiv:2109.04727·cs.CL·September 13, 2021

A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations

Ziyi Yang, Yinfei Yang, Daniel Cer, Eric Darve

PDF

Open Access 1 Repo

TL;DR

This paper introduces a simple, post-training linear method called LIR to remove language bias from multilingual models, significantly improving cross-lingual tasks by isolating semantic content from language identity.

Contribution

The paper proposes a novel, model-agnostic linear approach using geometric algebra to eliminate language bias in multilingual representations, enhancing cross-lingual transfer.

Findings

01

LIR achieves nearly 100% relative improvement in MAP on LAReQA for weak-alignment models.

02

Removing language information improves cross-lingual transfer performance.

03

LIR is simple, effective, and applicable post-training without model modifications.

Abstract

Language agnostic and semantic-language information isolation is an emerging research direction for multilingual representations models. We explore this problem from a novel angle of geometric algebra and semantic space. A simple but highly effective method "Language Information Removal (LIR)" factors out language identity information from semantic related components in multilingual representations pre-trained on multi-monolingual data. A post-training and model-agnostic method, LIR only uses simple linear operations, e.g. matrix factorization and orthogonal projection. LIR reveals that for weak-alignment multilingual systems, the principal components of semantic spaces primarily encodes language identity information. We first evaluate the LIR on a cross-lingual question answer retrieval task (LAReQA), which requires the strong alignment for the multilingual embedding space. Experiment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ziyi-yang/lir
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Sentiment Analysis and Opinion Mining