TL;DR
GLiGuard is a compact, schema-conditioned bidirectional encoder for LLM safety classification that achieves high accuracy and efficiency, enabling scalable real-time content moderation across multiple safety dimensions.
Contribution
Introduces GLiGuard, a 0.3B-parameter schema-conditioned encoder that performs multi-aspect safety evaluation efficiently, rivaling larger autoregressive models in accuracy.
Findings
GLiGuard achieves competitive F1 scores with 23-90x smaller models.
It delivers up to 16x higher throughput and 17x lower latency.
Performs well across nine safety benchmarks.
Abstract
Ensuring safe, policy-compliant outputs from large language models requires real-time content moderation that can scale across multiple safety dimensions. However, state-of-the-art guardrail models rely on autoregressive decoders with 7B--27B parameters, reformulating what is fundamentally a classification problem as sequential text generation, a design choice that incurs high latency and scales poorly to multi-aspect evaluation. In this work, we introduce \textbf{GLiGuard}, a 0.3B-parameter schema-conditioned bidirectional encoder adapted from GLiNER2 for LLM content moderation. The key idea is to encode task definitions and label semantics directly into the input sequence as structured token schemas, enabling simultaneous evaluation of prompt safety, response safety, refusal detection, 14 fine-grained harm categories, and 11 jailbreak strategies in a single non-autoregressive forward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
