DRAFT: Task Decoupled Latent Reasoning for Agent Safety
Lin Wang, Junfeng Fang, Dan Zhang, Fei Shen, Xiang Wang, Tat-Seng Chua

TL;DR
DRAFT introduces a latent reasoning framework that improves agent safety monitoring by decoupling safety judgment into two trainable stages, enabling more effective evidence aggregation in long, noisy interaction trajectories.
Contribution
The paper proposes DRAFT, a novel latent reasoning approach that outperforms existing methods in safety assessment benchmarks by decoupling and aggregating evidence in latent space.
Findings
DRAFT achieves up to 91.18% accuracy on safety benchmarks.
Latent evidence aggregation improves safety judgment accuracy.
Extractor and Reasoner modules work synergistically to enhance safety detection.
Abstract
The advent of tool-using LLM agents shifts safety monitoring from output moderation to auditing long, noisy interaction trajectories, where risk-critical evidence is sparse-making standard binary supervision poorly suited for credit assignment. To address this, we propose DRAFT (Task Decoupled Latent Reasoning for Agent Safety), a latent reasoning framework that decouples safety judgment into two trainable stages: an Extractor that distills the full trajectory into a compact continuous latent draft, and a Reasoner that jointly attends to the draft and the original trajectory to predict safety. DRAFT avoids lossy explicit summarize-then-judge pipelines by performing evidence aggregation in latent space, enabling end-to-end differentiable training.Across benchmarks including ASSEBench and R-Judge, DRAFT consistently outperforms strong baselines, improving accuracy from 63.27% (LoRA) to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
