LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Inconsistencies
Felix Friedrich, Simone Tedeschi, Patrick Schramowski, Manuel Brack, Roberto Navigli, Huu Nguyen, Bo Li, Kristian Kersting

TL;DR
This paper introduces M-ALERT, a multilingual safety benchmark for LLMs, revealing significant cross-linguistic safety inconsistencies and emphasizing the need for language-specific safety evaluations.
Contribution
The paper presents M-ALERT, a comprehensive multilingual safety benchmark with 75k prompts across five languages, and highlights safety inconsistencies in 39 state-of-the-art LLMs.
Findings
Models often show safety inconsistencies across languages.
Certain categories like substance_cannabis are universally unsafe.
Language-specific safety issues are prevalent in current LLMs.
Abstract
Building safe Large Language Models (LLMs) across multiple languages is essential in ensuring both safe access and linguistic diversity. To this end, we conduct a large-scale, comprehensive safety evaluation of the current LLM landscape. For this purpose, we introduce M-ALERT, a multilingual benchmark that evaluates the safety of LLMs in five languages: English, French, German, Italian, and Spanish. M-ALERT includes 15k high-quality prompts per language, totaling 75k, with category-wise annotations. Our extensive experiments on 39 state-of-the-art LLMs highlight the importance of language-specific safety analysis, revealing that models often exhibit significant inconsistencies in safety across languages and categories. For instance, Llama3.2 shows high unsafety in category crime_tax for Italian but remains safe in other languages. Similar inconsistencies can be observed across all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Library Science and Information Systems
