ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback

Yutao Mou; Zhangchi Xue; Lijun Li; Peiyang Liu; Shikun Zhang; Wei Ye; Jing Shao

arXiv:2601.10156·cs.CL·January 16, 2026

ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback

Yutao Mou, Zhangchi Xue, Lijun Li, Peiyang Liu, Shikun Zhang, Wei Ye, Jing Shao

PDF

Open Access

TL;DR

This paper introduces TS-Guard and TS-Flow, innovative methods for real-time detection and prevention of unsafe tool invocations in LLM agents, significantly enhancing security and task success rates.

Contribution

It presents a novel benchmark TS-Bench and develops proactive guardrail models TS-Guard and TS-Flow for improving safety and robustness of LLM-based agents.

Findings

01

Reduces harmful tool invocations by 65% on average.

02

Improves benign task completion by approximately 10%.

03

Provides interpretable safety judgments and feedback.

Abstract

While LLM-based agents can interact with environments via invoking external tools, their expanded capabilities also amplify security risks. Monitoring step-level tool invocation behaviors in real time and proactively intervening before unsafe execution is critical for agent deployment, yet remains under-explored. In this work, we first construct TS-Bench, a novel benchmark for step-level tool invocation safety detection in LLM agents. We then develop a guardrail model, TS-Guard, using multi-task reinforcement learning. The model proactively detects unsafe tool invocation actions before execution by reasoning over the interaction history. It assesses request harmfulness and action-attack correlations, producing interpretable and generalizable safety judgments and feedback. Furthermore, we introduce TS-Flow, a guardrail-feedback-driven reasoning framework for LLM agents, which reduces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Access Control and Trust · Advanced Malware Detection Techniques