ChromaFlow: A Negative Ablation Study of Orchestration Overhead in Tool-Augmented Agent Evaluation
Tarun Mittal

TL;DR
This paper introduces ChromaFlow, an autonomous reasoning framework that evaluates the impact of orchestration overhead, revealing that increased orchestration can introduce operational noise without improving overall performance.
Contribution
It provides a detailed negative ablation study showing that more aggressive orchestration does not necessarily enhance agent evaluation accuracy and increases operational noise.
Findings
More orchestration did not improve full-set performance.
Increased orchestration led to more operational failures and noise.
Explicit integrity controls improved accuracy but increased costs.
Abstract
Autonomous language-model agents increasingly combine planning, tool use, document processing, browsing, code execution, and verification loops. These capabilities make agent systems more useful, but they also introduce operational failure modes that are not visible from final accuracy alone. This report presents ChromaFlow, a tool-augmented autonomous reasoning framework built around planner-directed execution, specialized tool use, and telemetry-driven evaluation. We analyze ChromaFlow on GAIA 2023 Level-1 validation tasks under clean evaluation constraints. A frozen full Level-1 baseline achieved 29/53 correct answers, or 54.72%. A later recovery configuration with expanded orchestration achieved 27/53 correct answers, or 50.94%, while increasing tracebacks, timeout events, tool-failure mentions, token-log calls, and campaign-log cost estimates. Two randomized 20-task smoke…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
