TL;DR
This paper introduces a novel approach to debias multilingual large language models by performing debiasing in a joint cross-lingual latent space, leading to improved effectiveness and transferability across languages.
Contribution
It proposes constructing a well-aligned cross-lingual latent space using autoencoders and applying debiasing techniques within this space, enhancing cross-lingual debiasing performance.
Findings
Autoencoders effectively create aligned cross-lingual latent spaces.
Debiasing in the latent space improves overall debiasing effectiveness.
Cross-lingual transferability of debiasing techniques is significantly enhanced.
Abstract
Debiasing techniques such as SentDebias aim to reduce bias in large language models (LLMs). Previous studies have evaluated their cross-lingual transferability by directly applying these methods to LLM representations, revealing their limited effectiveness across languages. In this work, we therefore propose to perform debiasing in a joint latent space rather than directly on LLM representations. We construct a well-aligned cross-lingual latent space using an autoencoder trained on parallel TED talk scripts. Our experiments with Aya-expanse and two debiasing techniques across four languages (English, French, German, Dutch) demonstrate that a) autoencoders effectively construct a well-aligned cross-lingual latent space, and b) applying debiasing techniques in the learned cross-lingual latent space significantly improves both the overall debiasing performance and cross-lingual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
