Evaluating Stochasticity in Deep Research Agents
Haotian Zhai, Elias Stengel-Eskin, Pratik Patil, Liu Leqi

TL;DR
This paper studies the variability in Deep Research Agents' outputs caused by stochastic processes, formalizes its sources, and proposes mitigation strategies to improve research consistency and quality.
Contribution
It models stochasticity in DRAs as Markov Decision Processes and introduces an evaluation framework to quantify and analyze variance sources.
Findings
Reducing stochasticity improves research output quality.
Inference and early-stage stochasticity are major contributors to output variance.
Mitigation strategies decrease stochasticity by 22% without compromising quality.
Abstract
Deep Research Agents (DRAs) are promising agentic systems that gather and synthesize information to support research across domains such as financial decision-making, medical analysis, and scientific discovery. Despite recent improvements in research quality (e.g., outcome accuracy when ground truth is available), DRA system design often overlooks a critical barrier to real-world deployment: stochasticity. Under identical queries, repeated executions of DRAs can exhibit substantial variability in terms of research outcome, findings, and citations. In this paper, we formalize the study of stochasticity in DRAs by modeling them as information acquisition Markov Decision Processes. We introduce an evaluation framework that quantifies variance in the system and identify three sources of it: information acquisition, information compression, and inference. Through controlled experiments, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Mobile Crowdsensing and Crowdsourcing · Explainable Artificial Intelligence (XAI)
