Learn and Unlearn: Addressing Misinformation in Multilingual LLMs
Taiming Lu, Philipp Koehn

TL;DR
This paper examines how harmful misinformation propagates in multilingual LLMs and shows that effective unlearning requires addressing all languages involved to prevent the spread of harmful content.
Contribution
It highlights the limitations of standard unlearning methods in multilingual settings and proposes the need for comprehensive strategies that consider multiple languages.
Findings
Harmful information spreads across languages in LLMs.
Standard unlearning methods are insufficient for multilingual models.
Addressing both English and original languages effectively removes harmful outputs.
Abstract
This paper investigates the propagation of harmful information in multilingual large language models (LLMs) and evaluates the efficacy of various unlearning methods. We demonstrate that fake information, regardless of the language it is in, once introduced into these models through training data, can spread across different languages, compromising the integrity and reliability of the generated content. Our findings reveal that standard unlearning techniques, which typically focus on English data, are insufficient in mitigating the spread of harmful content in multilingual contexts and could inadvertently reinforce harmful content across languages. We show that only by addressing harmful responses in both English and the original language of the harmful data can we effectively eliminate generations for all languages. This underscores the critical need for comprehensive unlearning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Library Science and Information Systems
MethodsFocus
