DRAFT: Task Decoupled Latent Reasoning for Agent Safety

Lin Wang; Junfeng Fang; Dan Zhang; Fei Shen; Xiang Wang; Tat-Seng Chua

arXiv:2604.03242·cs.LG·April 7, 2026

DRAFT: Task Decoupled Latent Reasoning for Agent Safety

Lin Wang, Junfeng Fang, Dan Zhang, Fei Shen, Xiang Wang, Tat-Seng Chua

PDF

TL;DR

DRAFT introduces a latent reasoning framework that improves agent safety monitoring by decoupling safety judgment into two trainable stages, enabling more effective evidence aggregation in long, noisy interaction trajectories.

Contribution

The paper proposes DRAFT, a novel latent reasoning approach that outperforms existing methods in safety assessment benchmarks by decoupling and aggregating evidence in latent space.

Findings

01

DRAFT achieves up to 91.18% accuracy on safety benchmarks.

02

Latent evidence aggregation improves safety judgment accuracy.

03

Extractor and Reasoner modules work synergistically to enhance safety detection.

Abstract

The advent of tool-using LLM agents shifts safety monitoring from output moderation to auditing long, noisy interaction trajectories, where risk-critical evidence is sparse-making standard binary supervision poorly suited for credit assignment. To address this, we propose DRAFT (Task Decoupled Latent Reasoning for Agent Safety), a latent reasoning framework that decouples safety judgment into two trainable stages: an Extractor that distills the full trajectory into a compact continuous latent draft, and a Reasoner that jointly attends to the draft and the original trajectory to predict safety. DRAFT avoids lossy explicit summarize-then-judge pipelines by performing evidence aggregation in latent space, enabling end-to-end differentiable training.Across benchmarks including ASSEBench and R-Judge, DRAFT consistently outperforms strong baselines, improving accuracy from 63.27% (LoRA) to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.