DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
Yihe Deng, Yu Yang, Junkai Zhang, Wei Wang, Bo Li

TL;DR
DuoGuard introduces a two-player RL framework where a generator and guardrail model co-evolve to create synthetic multilingual safety data, significantly improving safety performance across languages while being efficient and scalable.
Contribution
We propose a novel adversarial RL framework for multilingual guardrail training, formalize its convergence, and demonstrate its superiority over existing models in safety benchmarks.
Findings
Achieves nearly 10% improvement over LlamaGuard3 in English safety benchmarks.
Outperforms state-of-the-art models while being 4.5x faster and smaller.
Effectively addresses safety data imbalance in low-resource languages.
Abstract
The rapid advancement of large language models (LLMs) has increased the need for guardrail models to ensure responsible use, particularly in detecting unsafe and illegal content. While substantial safety data exist in English, multilingual guardrail modeling remains underexplored due to the scarcity of open-source safety data in other languages. To address this gap, we propose a novel two-player Reinforcement Learning (RL) framework, where a generator and a guardrail model co-evolve adversarially to produce high-quality synthetic data for multilingual guardrail training. We theoretically formalize this interaction as a two-player game, proving convergence to a Nash equilibrium. Empirical evaluations show that our model \ours outperforms state-of-the-art models, achieving nearly 10% improvement over LlamaGuard3 (8B) on English benchmarks while being 4.5x faster at inference with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
