TRACE: Trajectory-Aware Comprehensive Evaluation for Deep Research Agents
Yanyu Chen, Jiyue Jiang, Jiahong Liu, Yifei Zhang, Xiao Guo, Irwin King

TL;DR
TRACE offers a comprehensive evaluation framework for deep research agents by assessing their entire problem-solving process, addressing limitations of traditional metrics and static benchmarks to better measure reasoning quality, robustness, and latent capabilities.
Contribution
The paper introduces TRACE, a novel holistic evaluation framework with new metrics and a benchmark, enabling detailed assessment of agents' reasoning, efficiency, and latent abilities.
Findings
TRACE provides granular rankings of agents.
It uncovers trade-offs between accuracy, efficiency, and robustness.
The framework improves understanding of agent capabilities.
Abstract
The evaluation of Deep Research Agents is a critical challenge, as conventional outcome-based metrics fail to capture the nuances of their complex reasoning. Current evaluation faces two primary challenges: 1) a reliance on singular metrics like Pass@1, creating a "high-score illusion" that ignores the quality, efficiency, and soundness of the reasoning process; and 2) the failure of static benchmarks to quantify crucial attributes like robustness and latent capability. To address these gaps, we introduce TRACE (Trajectory-Aware Comprehensive Evaluation), a framework that holistically assesses the entire problem-solving trajectory. To counter the "high-score illusion", we propose a Hierarchical Trajectory Utility Function that quantifies process efficiency and cognitive quality, including evidence grounding, alongside accuracy. To measure deeper attributes, TRACE introduces a Scaffolded…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI
