Towards Safe Multilingual Frontier AI

Art\=urs Kanepajs; Vladimir Ivanov; Richard Moulange

arXiv:2409.13708·cs.CL·October 30, 2024

Towards Safe Multilingual Frontier AI

Art\=urs Kanepajs, Vladimir Ivanov, Richard Moulange

PDF

Open Access 1 Repo

TL;DR

This paper investigates multilingual jailbreak vulnerabilities in large language models across EU languages and proposes policy measures to enhance AI safety and inclusivity within the EU regulatory framework.

Contribution

It provides an empirical analysis of multilingual jailbreak risks in advanced AI models and offers policy recommendations aligned with EU law to improve multilingual AI safety.

Findings

01

Vulnerabilities vary with language resource levels.

02

Testing across 24 EU languages reveals specific risks.

03

Policy proposals aim to mitigate jailbreaks and promote inclusivity.

Abstract

Linguistically inclusive LLMs -- which maintain good performance regardless of the language with which they are prompted -- are necessary for the diffusion of AI benefits around the world. Multilingual jailbreaks that rely on language translation to evade safety measures undermine the safe and inclusive deployment of AI systems. We provide policy recommendations to enhance the multilingual capabilities of AI while mitigating the risks of multilingual jailbreaks. We examine how a language's level of resourcing relates to how vulnerable LLMs are to multilingual jailbreaks in that language. We do this by testing five advanced AI models across 24 official languages of the EU. Building on prior research, we propose policy actions that align with the EU legal landscape and institutional framework to address multilingual jailbreaks, while promoting linguistic inclusivity. These include…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

akanepajs/multilingual
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification

MethodsDiffusion · ALIGN