YuFeng-XGuard: A Reasoning-Centric, Interpretable, and Flexible Guardrail Model for Large Language Models

Junyu Lin; Meizhen Liu; Xiufeng Huang; Jinfeng Li; Haiwen Hong; Xiaohan Yuan; Yuefeng Chen; Longtao Huang; Hui Xue; Ranjie Duan; Zhikai Chen; Yuchuan Fu; Defeng Li; Lingyao Gao; Yitong Yang

arXiv:2601.15588·cs.CL·January 23, 2026

YuFeng-XGuard: A Reasoning-Centric, Interpretable, and Flexible Guardrail Model for Large Language Models

Junyu Lin, Meizhen Liu, Xiufeng Huang, Jinfeng Li, Haiwen Hong, Xiaohan Yuan, Yuefeng Chen, Longtao Huang, Hui Xue, Ranjie Duan, Zhikai Chen, Yuchuan Fu, Defeng Li, Lingyao Gao, Yitong Yang

PDF

Open Access 4 Models 1 Datasets

TL;DR

YuFeng-XGuard is a novel, reasoning-centric guardrail model for LLMs that provides interpretable, multi-dimensional risk assessments with configurable policies, balancing safety, efficiency, and flexibility.

Contribution

It introduces a structured, reasoning-based approach for risk perception in LLMs, enabling interpretable, flexible, and efficient safety guardrails without retraining.

Findings

01

Achieves state-of-the-art safety performance on public benchmarks.

02

Balances decision speed and interpretability effectively.

03

Offers both full and lightweight model variants for diverse deployment.

Abstract

As large language models (LLMs) are increasingly deployed in real-world applications, safety guardrails are required to go beyond coarse-grained filtering and support fine-grained, interpretable, and adaptable risk assessment. However, existing solutions often rely on rapid classification schemes or post-hoc rules, resulting in limited transparency, inflexible policies, or prohibitive inference costs. To this end, we present YuFeng-XGuard, a reasoning-centric guardrail model family designed to perform multi-dimensional risk perception for LLM interactions. Instead of producing opaque binary judgments, YuFeng-XGuard generates structured risk predictions, including explicit risk categories and configurable confidence scores, accompanied by natural language explanations that expose the underlying reasoning process. This formulation enables safety decisions that are both actionable and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Alibaba-AAIG/XGuard-Train-Open-200K
dataset· 124 dl
124 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Safety Systems Engineering in Autonomy