Language Model Alignment in Multilingual Trolley Problems

Zhijing Jin; Max Kleiman-Weiner; Giorgio Piatti; Sydney Levine; Jiarui Liu; Fernando Gonzalez; Francesco Ortu; Andr\'as Strausz; Mrinmaya Sachan; Rada Mihalcea; Yejin Choi; Bernhard Sch\"olkopf

arXiv:2407.02273·cs.CL·May 29, 2025·1 cites

Language Model Alignment in Multilingual Trolley Problems

Zhijing Jin, Max Kleiman-Weiner, Giorgio Piatti, Sydney Levine, Jiarui Liu, Fernando Gonzalez, Francesco Ortu, Andr\'as Strausz, Mrinmaya Sachan, Rada Mihalcea, Yejin Choi, Bernhard Sch\"olkopf

PDF

Open Access 2 Repos 3 Reviews

TL;DR

This paper assesses how well multilingual large language models align with human moral judgments across diverse languages and cultures using a new cross-lingual dataset of trolley problem scenarios.

Contribution

It introduces MultiTP, a multilingual moral dilemma dataset, and analyzes the moral alignment of 19 LLMs across over 100 languages, revealing cross-lingual biases and variances.

Findings

01

Significant variance in moral alignment across languages.

02

LLMs exhibit biases related to demographic and linguistic factors.

03

Alignment varies notably among different models and languages.

Abstract

We evaluate the moral alignment of LLMs with human preferences in multilingual trolley problems. Building on the Moral Machine experiment, which captures over 40 million human judgments across 200+ countries, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP. This dataset enables the assessment of LLMs' decision-making processes in diverse linguistic contexts. Our analysis explores the alignment of 19 different LLMs with human judgments, capturing preferences across six moral dimensions: species, gender, fitness, status, age, and the number of lives involved. By correlating these preferences with the demographic distribution of language speakers and examining the consistency of LLM responses to various prompt paraphrasings, our findings provide insights into cross-lingual and ethical biases of LLMs and their intersection. We discover…

Peer Reviews

Decision·ICLR 2025 Spotlight

Reviewer 01Rating 5Confidence 5

Strengths

1. Originality Strength: The paper presents approaches by analyzing the moral judgments of large language models (LLMs) within a multilingual context. This exploration of how LLMs interact with diverse cultural perspectives on moral dilemmas is a fresh contribution to the field, providing insights into the alignment (or misalignment) between machine-generated responses and human ethical considerations. 2. Clarity Strength: Despite some complexity, the paper effectively organizes its findings an

Weaknesses

1. Originality Weakness: While the paper attempts to explore moral judgments in a multilingual context, it does not significantly advance the discourse beyond existing literature on moral dilemmas. Many of the concepts discussed are already well-established in moral philosophy, and the paper may not provide enough innovative perspectives to stand out in a crowded field. 2. Quality Weakness: The reliability and validity of the moral dimension classifications could be questioned. The paper may la

Reviewer 02Rating 8Confidence 3

Strengths

The paper is well-structured and clearly written. It addresses an important new challenge in aligning LLMs' responses with human preferences in moral decision-making. The authors conducted detailed experiments and provided valuable insights. The MultiTP dataset, which spans numerous languages, could be highly useful for future studies on aligning model behavior with human ethics.

Weaknesses

One of the main concerns is the justification for aligning LLMs with demographic distributions and human preferences. This approach might introduce biases into the models. For example, in certain cultures where laws or social norms may place different values on individuals (e.g., men being valued more than women in some religions), aligning LLMs with such preferences could reinforce harmful biases. The examples provided in Figure 1 illustrate cultural biases related to age, which could extend to

Reviewer 03Rating 8Confidence 4

Strengths

**Originality**: The paper introduces an innovative approach by adapting the Moral Machine framework to evaluate LLMs across languages, creating the unique MULTITP dataset for consistent cross-lingual moral dilemma evaluation, and developing novel metrics for measuring alignment. **Quality**: The study demonstrates rigorous methodology through a comprehensive evaluation of 19 LLMs, systematic moral dimension variations, robust prompt paraphrasing tests, and careful statistical analysis with tra

Weaknesses

**Dataset Transformation**: Although the dataset is large and comprehensive, it is adapted from the existing *Moral Machine* experiment, raising questions about its novelty. The paper would benefit from clarifying what unique modifications or contributions were made to enhance the dataset beyond its scale and multilingual coverage. **Translation Quality**: Maintaining high translation quality across a vast number of questions, especially in low-resource languages, is challenging. The reliance o

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Dropout · Weight Decay · Residual Connection · Multi-Head Attention