HINTBench: Horizon-agent Intrinsic Non-attack Trajectory Benchmark

Jiacheng Wang; Jinchang Hou; Fabian Wang; Ping Jian; Chenfu Bao; Zhonghou Lv

arXiv:2604.13954·cs.LG·April 16, 2026

HINTBench: Horizon-agent Intrinsic Non-attack Trajectory Benchmark

Jiacheng Wang, Jinchang Hou, Fabian Wang, Ping Jian, Chenfu Bao, Zhonghou Lv

PDF

TL;DR

HINTBench is a new benchmark designed to evaluate intrinsic, long-horizon risks in agent trajectories, revealing significant gaps in current models' ability to detect and diagnose these risks.

Contribution

The paper introduces HINTBench, a comprehensive benchmark with annotations and tasks for intrinsic risk detection, localization, and failure diagnosis in agent trajectories.

Findings

01

LLMs perform well on trajectory-level risk detection.

02

Performance drops below 35% F1 on risk-step localization.

03

Existing guard models transfer poorly to intrinsic risk auditing.

Abstract

Existing agent-safety evaluation has focused mainly on externally induced risks. Yet agents may still enter unsafe trajectories under benign conditions. We study this complementary but underexplored setting through the lens of \emph{intrinsic} risk, where intrinsic failures remain latent, propagate across long-horizon execution, and eventually lead to high-consequence outcomes. To evaluate this setting, we introduce \emph{non-attack intrinsic risk auditing} and present \textbf{HINTBench}, a benchmark of 629 agent trajectories (523 risky, 106 safe; 33 steps on average) supporting three tasks: risk detection, risk-step localization, and intrinsic failure-type identification. Its annotations are organized under a unified five-constraint taxonomy. Experiments reveal a substantial capability gap: strong LLMs perform well on trajectory-level risk detection, but their performance drops to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.