OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models
Thomas Wang, Haowen Li

TL;DR
OpenGuardrails is an open-source platform that unifies safety detection, manipulation defense, and deployment infrastructure for large language models, offering configurable policies, a single-model guard architecture, and scalable design for enterprise safety.
Contribution
It introduces a fully open-source, scalable guardrails platform with a unified architecture and configurable policies, advancing safety and robustness in large language model deployment.
Findings
Supports 119 languages with state-of-the-art multilingual safety performance.
Compresses a 14B model to 3.3B while maintaining over 98% accuracy.
Achieves comprehensive safety coverage against content violations, attacks, and data leaks.
Abstract
As large language models (LLMs) are increasingly integrated into real-world applications, ensuring their safety, robustness, and privacy compliance has become critical. We present OpenGuardrails, the first fully open-source platform that unifies large-model-based safety detection, manipulation defense, and deployable guardrail infrastructure. OpenGuardrails protects against three major classes of risks: (1) content-safety violations such as harmful or explicit text generation, (2) model-manipulation attacks including prompt injection, jailbreaks, and code-interpreter abuse, and (3) data leakage involving sensitive or private information. Unlike prior modular or rule-based frameworks, OpenGuardrails introduces three core innovations: (1) a Configurable Policy Adaptation mechanism that allows per-request customization of unsafe categories and sensitivity thresholds; (2) a Unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
