DeepKnown-Guard: A Proprietary Model-Based Safety Response Framework for AI Agents

Qi Li; Jianjun Xu; Pingtao Wei; Jiu Li; Peiqiang Zhao; Jiwei Shi; Xuan Zhang; Yanhui Yang; Xiaodong Hui; Peng Xu; Wenqin Shao

arXiv:2511.03138·cs.AI·November 18, 2025

DeepKnown-Guard: A Proprietary Model-Based Safety Response Framework for AI Agents

Qi Li, Jianjun Xu, Pingtao Wei, Jiu Li, Peiqiang Zhao, Jiwei Shi, Xuan Zhang, Yanhui Yang, Xiaodong Hui, Peng Xu, Wenqin Shao

PDF

Open Access 1 Datasets

TL;DR

This paper introduces DeepKnown-Guard, a safety framework for LLMs that enhances risk detection and response at input and output levels, achieving high safety scores and robustness in critical applications.

Contribution

The paper presents a novel, proprietary safety response framework combining fine-grained risk classification and retrieval-augmented generation to improve LLM safety and trustworthiness.

Findings

01

Achieved 99.3% risk recall rate in input safety classification.

02

Attained 100% safety score on high-risk test set.

03

Significantly outperformed baseline safety models on public benchmarks.

Abstract

With the widespread application of Large Language Models (LLMs), their associated security issues have become increasingly prominent, severely constraining their trustworthy deployment in critical domains. This paper proposes a novel safety response framework designed to systematically safeguard LLMs at both the input and output levels. At the input level, the framework employs a supervised fine-tuning-based safety classification model. Through a fine-grained four-tier taxonomy (Safe, Unsafe, Conditionally Safe, Focused Attention), it performs precise risk identification and differentiated handling of user queries, significantly enhancing risk coverage and business scenario adaptability, and achieving a risk recall rate of 99.3%. At the output level, the framework integrates Retrieval-Augmented Generation (RAG) with a specifically fine-tuned interpretation model, ensuring all responses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

CaiZhiTech/DeepKnown-High-Risk-zh-20251105
dataset· 7 dl
7 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling