Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment
Yuyan Bu, Xiaohao Liu, ZhaoXing Ren, Yaodong Yang, Juntao Dai

TL;DR
This paper introduces a resource-efficient multilingual safety alignment method for large language models, enhancing cross-lingual consistency with minimal supervision by improving multilingual representation collinearity.
Contribution
The paper proposes a plug-and-play Multi-Lingual Consistency loss that enforces semantic alignment across languages in a single update, reducing resource requirements.
Findings
Improves multilingual safety alignment across diverse languages.
Enhances cross-lingual generalization with limited supervision.
Effective across different model architectures and alignment paradigms.
Abstract
The widespread deployment of large language models (LLMs) across linguistic communities necessitates reliable multilingual safety alignment. However, recent efforts to extend alignment to other languages often require substantial resources, either through large-scale, high-quality supervision in the target language or through pairwise alignment with high-resource languages, which limits scalability. In this work, we propose a resource-efficient method for improving multilingual safety alignment. We introduce a plug-and-play Multi-Lingual Consistency (MLC) loss that can be integrated into existing monolingual alignment pipelines. By improving collinearity between multilingual representation vectors, our method encourages directional consistency at the multilingual semantic level in a single update. This allows simultaneous alignment across multiple languages using only multilingual…
Peer Reviews
Decision·ICLR 2026 Poster
- **Important problem:** The paper addresses the problem of ensuring safe and consistent behavior across languages in LLMs. - **Conceptually elegant and technically sound:** The proposed spectral regularization via rank-1 optimization is simple yet well motivated and theoretically grounded. - **Strong empirical results:** Comprehensive evaluations across datasets, languages, and base alignment paradigms demonstrate consistent gains, especially for low-resource settings. - **Practical and efficie
- **Weak related work discussion:** The discussion of multilingual alignment baselines (e.g., MPO, SDRRL) is both incomplete and difficult to follow. The main paper only names them without explanation, forcing readers to consult the appendix, which is itself hard to follow. As a result, it is difficult to understand how these baselines differ conceptually or why they are appropriate points of comparison. - **Limited baselines:** The paper lacks an upper-bound comparison, e.g., training with full
- Clear motivation: Addresses a real and underexplored challenge: multilingual imbalance in LLM safety alignment. - Conceptual simplicity: The MLC loss is an elegant addition that can integrate easily with existing pipelines. - Empirical breadth: Includes multiple backbones (Qwen, Gemma), alignment paradigms (DPO, SFT, SimPO, ORPO), and both in- and out-of-distribution tests. - Data efficiency: Claims strong multilingual gains with minimal additional data (∼1.8M tokens vs. 15M+ for comparable
- Incremental contribution: The MLC loss is essentially a regularization of multilingual representations, conceptually simple and not a fundamentally new paradigm. - Theoretical shallowness: Despite heavy mathematical framing (singular value decomposition, spectral view), the theoretical section adds little genuine insight beyond enforcing collinearity. - Experimental bias: Evaluations rely on safety datasets constructed in English, potentially conflating multilingual improvement with transl
1. The objective is intuitive, effective and does not rely on any anchor languages. 2. The auxiliary loss objective can be generalized and integrated to any post-training safety paradigms. 3. The approach improves substantially safety performance of low-resource languages, while retaining that of high-resource languages.
1. The approach is potentially sensitive to hyperparameters such as layer selection. The best layer where representation alignment is most effective also seems task sepcific. 2. Scaling behavior of the objective is not tested beyond 7B. Divergence across languages may be beneficial for even larger models, where the consistency objective may not be effective.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Model-Driven Software Engineering Techniques
