The Multilingual Alignment Prism: Aligning Global and Local Preferences   to Reduce Harm

Aakanksha; Arash Ahmadian; Beyza Ermis; Seraphina Goldfarb-Tarrant,; Julia Kreutzer; Marzieh Fadaee; Sara Hooker

arXiv:2406.18682·cs.CL·July 9, 2024

The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm

Aakanksha, Arash Ahmadian, Beyza Ermis, Seraphina Goldfarb-Tarrant,, Julia Kreutzer, Marzieh Fadaee, Sara Hooker

PDF

Open Access 10 Models 4 Datasets

TL;DR

This paper investigates aligning AI systems to diverse global and local cultural preferences to reduce harm, using multilingual data and novel alignment techniques to improve safety across six languages.

Contribution

It introduces a multilingual red-teaming dataset and evaluates alignment methods for non-homogeneous cultural preferences, advancing safety in multilingual AI systems.

Findings

01

State-of-the-art alignment across 6 languages with minimal performance loss

02

Effective handling of non-stationary preference distributions across cultures

03

Insights into cross-lingual transfer and optimization for global safety

Abstract

A key concern with the concept of "alignment" is the implicit question of "alignment to what?". AI systems are increasingly used across the world, yet safety alignment is often focused on homogeneous monolingual settings. Additionally, preference training and safety measures often overfit to harms common in Western-centric datasets. Here, we explore the viability of different alignment approaches when balancing dual objectives: addressing and optimizing for a non-homogeneous set of languages and cultural preferences while minimizing both global and local harms. We collect the first set of human annotated red-teaming prompts in different languages distinguishing between global and local harm, which serve as a laboratory for understanding the reliability of alignment techniques when faced with preference distributions that are non-stationary across geographies and languages. While this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMigration, Health and Trauma · Interpreting and Communication in Healthcare

MethodsSparse Evolutionary Training