TRACE: Trajectory-Aware Comprehensive Evaluation for Deep Research Agents

Yanyu Chen; Jiyue Jiang; Jiahong Liu; Yifei Zhang; Xiao Guo; Irwin King

arXiv:2602.21230·cs.CL·February 26, 2026

TRACE: Trajectory-Aware Comprehensive Evaluation for Deep Research Agents

Yanyu Chen, Jiyue Jiang, Jiahong Liu, Yifei Zhang, Xiao Guo, Irwin King

PDF

Open Access

TL;DR

TRACE offers a comprehensive evaluation framework for deep research agents by assessing their entire problem-solving process, addressing limitations of traditional metrics and static benchmarks to better measure reasoning quality, robustness, and latent capabilities.

Contribution

The paper introduces TRACE, a novel holistic evaluation framework with new metrics and a benchmark, enabling detailed assessment of agents' reasoning, efficiency, and latent abilities.

Findings

01

TRACE provides granular rankings of agents.

02

It uncovers trade-offs between accuracy, efficiency, and robustness.

03

The framework improves understanding of agent capabilities.

Abstract

The evaluation of Deep Research Agents is a critical challenge, as conventional outcome-based metrics fail to capture the nuances of their complex reasoning. Current evaluation faces two primary challenges: 1) a reliance on singular metrics like Pass@1, creating a "high-score illusion" that ignores the quality, efficiency, and soundness of the reasoning process; and 2) the failure of static benchmarks to quantify crucial attributes like robustness and latent capability. To address these gaps, we introduce TRACE (Trajectory-Aware Comprehensive Evaluation), a framework that holistically assesses the entire problem-solving trajectory. To counter the "high-score illusion", we propose a Hierarchical Trajectory Utility Function that quantifies process efficiency and cognitive quality, including evidence grounding, alongside accuracy. To measure deeper attributes, TRACE introduces a Scaffolded…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI