GLiNER Guard: Unified Encoder Family for Production LLM Safety and Privacy

Bogdan Minko; Sabrina Sadiekh; Evgeniy Kokuykin

arXiv:2605.05277·cs.CR·May 8, 2026

GLiNER Guard: Unified Encoder Family for Production LLM Safety and Privacy

Bogdan Minko, Sabrina Sadiekh, Evgeniy Kokuykin

PDF

1 Repo

TL;DR

GLiNER Guard introduces a unified encoder approach for safety moderation and PII detection in production LLMs, achieving high throughput and low latency with multiple variants and a new benchmark.

Contribution

A novel unified encoder model that performs safety classification and PII detection in a single pass, with variants optimized for throughput and quality, and the release of PII-Bench for evaluation.

Findings

01

Compact models reach 193 requests/sec with latency below 1s.

02

GLiGuard Omni matches larger moderators on safety benchmarks.

03

Models and benchmarks are publicly available on HuggingFace.

Abstract

Production LLM systems require both safety moderation and PII detection under strict latency and cost constraints. This creates a trade-off: autoregressive moderators are accurate but expensive, while lightweight encoders are faster but less capable. We present GLiNER Guard (GLiGuard), a unified encoder that performs safety classification and PII detection in a single forward pass, simplifying safety pipelines. We introduce three variants: compact uni- and bi-encoders (145-147M) for high-throughput serving, and GLiGuard Omni (209M) for stronger moderation quality. Under dynamic batching on a single A100, the compact model reaches 193 requests/sec with P99 latency below 1s, achieving 1.6x higher throughput than GLiNER2. Omni remains competitive with much larger moderators on public safety benchmarks. We also release PII-Bench, a span-level benchmark for evaluating PII detection in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.