Lost in Translation? A Comparative Study on the Cross-Lingual Transfer of Composite Harms

Vaibhav Shukla; Hardik Sharma; Adith N Reganti; Soham Wasmatkar; Bagesh Kumar; Vrijendra Singh

arXiv:2602.07963·cs.CL·February 10, 2026

Lost in Translation? A Comparative Study on the Cross-Lingual Transfer of Composite Harms

Vaibhav Shukla, Hardik Sharma, Adith N Reganti, Soham Wasmatkar, Bagesh Kumar, Vrijendra Singh

PDF

Open Access 1 Video

TL;DR

This study evaluates how safety harms in large language models transfer across languages using a new multilingual benchmark, revealing significant challenges in maintaining safety standards in non-English languages.

Contribution

Introduces CompositeHarm, a multilingual benchmark combining adversarial and real-world harms, and analyzes safety transfer across six languages with scalable, energy-efficient evaluation methods.

Findings

01

Attack success rates increase in Indic languages, especially with adversarial syntax.

02

Contextual harms transfer more moderately across languages.

03

Lightweight inference strategies enable scalable, environmentally friendly multilingual safety testing.

Abstract

Most safety evaluations of large language models (LLMs) remain anchored in English. Translation is often used as a shortcut to probe multilingual behavior, but it rarely captures the full picture, especially when harmful intent or structure morphs across languages. Some types of harm survive translation almost intact, while others distort or disappear. To study this effect, we introduce CompositeHarm, a translation-based benchmark designed to examine how safety alignment holds up as both syntax and semantics shift. It combines two complementary English datasets, AttaQ, which targets structured adversarial attacks, and MMSafetyBench, which covers contextual, real-world harms, and extends them into six languages: English, Hindi, Assamese, Marathi, Kannada, and Gujarati. Using three large models, we find that attack success rates rise sharply in Indic languages, especially under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Lost in Translation? A Comparative Study on the Cross-Lingual Transfer of Composite Harms· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Explainable Artificial Intelligence (XAI)