The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm
Aakanksha, Arash Ahmadian, Beyza Ermis, Seraphina Goldfarb-Tarrant,, Julia Kreutzer, Marzieh Fadaee, Sara Hooker

TL;DR
This paper investigates aligning AI systems to diverse global and local cultural preferences to reduce harm, using multilingual data and novel alignment techniques to improve safety across six languages.
Contribution
It introduces a multilingual red-teaming dataset and evaluates alignment methods for non-homogeneous cultural preferences, advancing safety in multilingual AI systems.
Findings
State-of-the-art alignment across 6 languages with minimal performance loss
Effective handling of non-stationary preference distributions across cultures
Insights into cross-lingual transfer and optimization for global safety
Abstract
A key concern with the concept of "alignment" is the implicit question of "alignment to what?". AI systems are increasingly used across the world, yet safety alignment is often focused on homogeneous monolingual settings. Additionally, preference training and safety measures often overfit to harms common in Western-centric datasets. Here, we explore the viability of different alignment approaches when balancing dual objectives: addressing and optimizing for a non-homogeneous set of languages and cultural preferences while minimizing both global and local harms. We collect the first set of human annotated red-teaming prompts in different languages distinguishing between global and local harm, which serve as a laboratory for understanding the reliability of alignment techniques when faced with preference distributions that are non-stationary across geographies and languages. While this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗CohereLabs/aya-expanse-8bmodel· 16k dl· ♡ 42316k dl♡ 423
- 🤗CohereLabs/aya-expanse-32bmodel· 6.7k dl· ♡ 2896.7k dl♡ 289
- 🤗jth01/aya-expanse-8b-5.0bpw-exl2model· 2 dl2 dl
- 🤗lucyknada/CohereForAI_aya-expanse-8b-exl2model· ♡ 2♡ 2
- 🤗duyntnet/aya-expanse-8b-imatrix-GGUFmodel· 47 dl47 dl
- 🤗lucyknada/CohereForAI_aya-expanse-32b-exl2model· ♡ 2♡ 2
- 🤗Andrewwwwww/aya-expanse-32bmodel· 3 dl3 dl
- 🤗Svngoku/Aya-Expanse-8B-Frenchmodel· 2 dl2 dl
- 🤗QuantFactory/aya-expanse-8b-GGUFmodel· 194 dl· ♡ 5194 dl♡ 5
- 🤗duyntnet/aya-expanse-32b-imatrix-GGUFmodel· 62 dl62 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMigration, Health and Trauma · Interpreting and Communication in Healthcare
MethodsSparse Evolutionary Training
