Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment

Yuyan Bu; Xiaohao Liu; ZhaoXing Ren; Yaodong Yang; Juntao Dai

arXiv:2602.16660·cs.CL·February 19, 2026

Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment

Yuyan Bu, Xiaohao Liu, ZhaoXing Ren, Yaodong Yang, Juntao Dai

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a resource-efficient multilingual safety alignment method for large language models, enhancing cross-lingual consistency with minimal supervision by improving multilingual representation collinearity.

Contribution

The paper proposes a plug-and-play Multi-Lingual Consistency loss that enforces semantic alignment across languages in a single update, reducing resource requirements.

Findings

01

Improves multilingual safety alignment across diverse languages.

02

Enhances cross-lingual generalization with limited supervision.

03

Effective across different model architectures and alignment paradigms.

Abstract

The widespread deployment of large language models (LLMs) across linguistic communities necessitates reliable multilingual safety alignment. However, recent efforts to extend alignment to other languages often require substantial resources, either through large-scale, high-quality supervision in the target language or through pairwise alignment with high-resource languages, which limits scalability. In this work, we propose a resource-efficient method for improving multilingual safety alignment. We introduce a plug-and-play Multi-Lingual Consistency (MLC) loss that can be integrated into existing monolingual alignment pipelines. By improving collinearity between multilingual representation vectors, our method encourages directional consistency at the multilingual semantic level in a single update. This allows simultaneous alignment across multiple languages using only multilingual…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

- **Important problem:** The paper addresses the problem of ensuring safe and consistent behavior across languages in LLMs. - **Conceptually elegant and technically sound:** The proposed spectral regularization via rank-1 optimization is simple yet well motivated and theoretically grounded. - **Strong empirical results:** Comprehensive evaluations across datasets, languages, and base alignment paradigms demonstrate consistent gains, especially for low-resource settings. - **Practical and efficie

Weaknesses

- **Weak related work discussion:** The discussion of multilingual alignment baselines (e.g., MPO, SDRRL) is both incomplete and difficult to follow. The main paper only names them without explanation, forcing readers to consult the appendix, which is itself hard to follow. As a result, it is difficult to understand how these baselines differ conceptually or why they are appropriate points of comparison. - **Limited baselines:** The paper lacks an upper-bound comparison, e.g., training with full

Reviewer 02Rating 6Confidence 2

Strengths

- Clear motivation: Addresses a real and underexplored challenge: multilingual imbalance in LLM safety alignment. - Conceptual simplicity: The MLC loss is an elegant addition that can integrate easily with existing pipelines. - Empirical breadth: Includes multiple backbones (Qwen, Gemma), alignment paradigms (DPO, SFT, SimPO, ORPO), and both in- and out-of-distribution tests. - Data efficiency: Claims strong multilingual gains with minimal additional data (∼1.8M tokens vs. 15M+ for comparable

Weaknesses

- Incremental contribution: The MLC loss is essentially a regularization of multilingual representations, conceptually simple and not a fundamentally new paradigm. - Theoretical shallowness: Despite heavy mathematical framing (singular value decomposition, spectral view), the theoretical section adds little genuine insight beyond enforcing collinearity. - Experimental bias: Evaluations rely on safety datasets constructed in English, potentially conflating multilingual improvement with transl

Reviewer 03Rating 6Confidence 4

Strengths

1. The objective is intuitive, effective and does not rely on any anchor languages. 2. The auxiliary loss objective can be generalized and integrated to any post-training safety paradigms. 3. The approach improves substantially safety performance of low-resource languages, while retaining that of high-resource languages.

Weaknesses

1. The approach is potentially sensitive to hyperparameters such as layer selection. The best layer where representation alignment is most effective also seems task sepcific. 2. Scaling behavior of the objective is not tested beyond 7B. Divergence across languages may be beneficial for even larger models, where the consistency objective may not be effective.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Model-Driven Software Engineering Techniques