OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models

Thomas Wang; Haowen Li

arXiv:2510.19169·cs.CR·October 30, 2025

OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models

Thomas Wang, Haowen Li

PDF

2 Models 3 Datasets

TL;DR

OpenGuardrails is an open-source platform that unifies safety detection, manipulation defense, and deployment infrastructure for large language models, offering configurable policies, a single-model guard architecture, and scalable design for enterprise safety.

Contribution

It introduces a fully open-source, scalable guardrails platform with a unified architecture and configurable policies, advancing safety and robustness in large language model deployment.

Findings

01

Supports 119 languages with state-of-the-art multilingual safety performance.

02

Compresses a 14B model to 3.3B while maintaining over 98% accuracy.

03

Achieves comprehensive safety coverage against content violations, attacks, and data leaks.

Abstract

As large language models (LLMs) are increasingly integrated into real-world applications, ensuring their safety, robustness, and privacy compliance has become critical. We present OpenGuardrails, the first fully open-source platform that unifies large-model-based safety detection, manipulation defense, and deployable guardrail infrastructure. OpenGuardrails protects against three major classes of risks: (1) content-safety violations such as harmful or explicit text generation, (2) model-manipulation attacks including prompt injection, jailbreaks, and code-interpreter abuse, and (3) data leakage involving sensitive or private information. Unlike prior modular or rule-based frameworks, OpenGuardrails introduces three core innovations: (1) a Configurable Policy Adaptation mechanism that allows per-request customization of unsafe categories and sensitivity thresholds; (2) a Unified…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.