Towards Safe Multilingual Frontier AI
Art\=urs Kanepajs, Vladimir Ivanov, Richard Moulange

TL;DR
This paper investigates multilingual jailbreak vulnerabilities in large language models across EU languages and proposes policy measures to enhance AI safety and inclusivity within the EU regulatory framework.
Contribution
It provides an empirical analysis of multilingual jailbreak risks in advanced AI models and offers policy recommendations aligned with EU law to improve multilingual AI safety.
Findings
Vulnerabilities vary with language resource levels.
Testing across 24 EU languages reveals specific risks.
Policy proposals aim to mitigate jailbreaks and promote inclusivity.
Abstract
Linguistically inclusive LLMs -- which maintain good performance regardless of the language with which they are prompted -- are necessary for the diffusion of AI benefits around the world. Multilingual jailbreaks that rely on language translation to evade safety measures undermine the safe and inclusive deployment of AI systems. We provide policy recommendations to enhance the multilingual capabilities of AI while mitigating the risks of multilingual jailbreaks. We examine how a language's level of resourcing relates to how vulnerable LLMs are to multilingual jailbreaks in that language. We do this by testing five advanced AI models across 24 official languages of the EU. Building on prior research, we propose policy actions that align with the EU legal landscape and institutional framework to address multilingual jailbreaks, while promoting linguistic inclusivity. These include…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification
MethodsDiffusion · ALIGN
