TL;DR
This paper investigates how uncertainty and information contamination affect multi-agent workflows, introducing a trace-based measurement framework and empirical analysis across multiple tasks and models.
Contribution
It provides a formal taxonomy of contamination types, a trace-based detection framework, and empirical insights into contamination phenomena in structured workflows.
Findings
Workflows can diverge significantly yet still produce correct answers.
Contamination manifests as silent semantic errors, behavioral detours, or structural disruptions.
Common verification methods often fail to detect contamination effectively.
Abstract
Reasoning over heterogeneous artifacts (PDFs, spreadsheets, slide decks, etc.) increasingly occurs within structured agent workflows that iteratively extract, transform, and reference external information. In these workflows, uncertainty is not merely an input-quality issue: it can redirect decomposition and routing decisions, reshape intermediate state, and produce qualitatively different execution trajectories. We study this phenomenon by treating uncertainty as a controlled variable: we inject structured perturbations into artifact-derived representations, execute fixed workflows under comprehensive logging, and quantify contamination via trace divergence in plans, tool invocations, and intermediate state. Across 614 paired runs on 32 GAIA tasks with three different language models, we find a decoupling: workflows may diverge substantially yet recover correct answers, or remain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
