ROK-FORTRESS: Measuring the Effect of Geopolitical Transcreation for National Security and Public Safety
Michael S. Lee, Yash Maurya, Drew Rein, Bert Herring, Jonathan Nguyen, Kyungho Song, Udari Madhushani Sehwag, Jiyeon Cho, Kaustubh Deshpande, Yeongkyun Jang, Jiyeon Joo, Minn Seok Choi, Evi Fuelle, Christina Q Knight, Joseph Brandifino, Max Fenkell

TL;DR
This paper introduces ROK-FORTRESS, a bilingual benchmark assessing how language and geopolitical context influence safety evaluations of large language models, revealing significant interactions missed by translation-only benchmarks.
Contribution
The paper presents a novel transcreation matrix methodology and a culturally adversarial benchmark for evaluating multilingual safety risks in LLMs, focusing on English-Korean and U.S.-ROK geopolitical contexts.
Findings
Korean variants show a consistent suppression effect in safety responses.
Geopolitical grounding can mitigate language-driven safety suppression.
Model responses vary significantly based on language and geopolitical context.
Abstract
Safety evaluations for large language models (LLMs) increasingly target high-stakes National Security and Public Safety (NSPS) risks, yet multilingual safety is typically assessed through translation-only benchmarks that preserve the underlying scenario, and empirical evidence of how language and geopolitical context interact remains limited to a narrow set of language pairs. We introduce \emph{ROK-FORTRESS} https://huggingface.co/datasets/ScaleAI/ROK-FORTRESS_public, a bilingual, culturally adversarial NSPS benchmark that uses the English--Korean language pair and U.S.--ROK geopolitical axis as a case study, separating the effects of language and geopolitical grounding via a \emph{transcreation matrix}: adversarial intents are evaluated under controlled combinations of (i) English versus Korean language and (ii) U.S.\ versus Korean entities, institutions, and operational details. Each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
