Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment

Somnath Banerjee; Sayan Layek; Pratyush Chatterjee; Animesh Mukherjee; Rima Hazra

arXiv:2502.11244·cs.CL·August 25, 2025

Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment

Somnath Banerjee, Sayan Layek, Pratyush Chatterjee, Animesh Mukherjee, Rima Hazra

PDF

Open Access 1 Repo 1 Datasets 2 Videos

TL;DR

Soteria is a novel method that minimally adjusts language-specific functional parameters in multilingual LLMs to improve safety and reduce harmful content generation across diverse languages.

Contribution

It introduces Soteria, a lightweight approach for language-specific safety tuning by adjusting functional heads, and presents XThreatBench, a new multilingual dataset for evaluating harmful behaviors.

Findings

01

Soteria significantly reduces policy violations across multiple languages.

02

The approach maintains overall model performance in low-resource settings.

03

Experiments demonstrate improved safety metrics in open-source LLMs.

Abstract

Ensuring consistent safety across multiple languages remains a significant challenge for large language models (LLMs). We introduce Soteria, a lightweight yet powerful strategy that locates and minimally adjusts the "functional heads" most responsible for harmful content generation in each language. By altering only a fraction of parameters, Soteria drastically reduces policy violations without sacrificing overall model performance, even in low-resource settings. To rigorously evaluate our approach, we also present XThreatBench, a specialized multilingual dataset capturing fine-grained harmful behaviors drawn from real policy guidelines. Experiments with leading open-source LLMs (e.g., Llama, Qwen, Mistral) show that Soteria consistently improves safety metrics across high-, mid-, and low-resource languages. These findings highlight a promising path toward scalable, linguistically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

neuralsentinel/soteria
noneOfficial

Datasets

SoftMINER-Group/Soteria
dataset· 5 dl
5 dl

Videos

Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment· underline

Taxonomy

TopicsOccupational Health and Safety Research · Risk and Safety Analysis