ChromaFlow: A Negative Ablation Study of Orchestration Overhead in Tool-Augmented Agent Evaluation

Tarun Mittal

arXiv:2605.14102·cs.AI·May 20, 2026

ChromaFlow: A Negative Ablation Study of Orchestration Overhead in Tool-Augmented Agent Evaluation

Tarun Mittal

PDF

TL;DR

This paper introduces ChromaFlow, an autonomous reasoning framework that evaluates the impact of orchestration overhead, revealing that increased orchestration can introduce operational noise without improving overall performance.

Contribution

It provides a detailed negative ablation study showing that more aggressive orchestration does not necessarily enhance agent evaluation accuracy and increases operational noise.

Findings

01

More orchestration did not improve full-set performance.

02

Increased orchestration led to more operational failures and noise.

03

Explicit integrity controls improved accuracy but increased costs.

Abstract

Autonomous language-model agents increasingly combine planning, tool use, document processing, browsing, code execution, and verification loops. These capabilities make agent systems more useful, but they also introduce operational failure modes that are not visible from final accuracy alone. This report presents ChromaFlow, a tool-augmented autonomous reasoning framework built around planner-directed execution, specialized tool use, and telemetry-driven evaluation. We analyze ChromaFlow on GAIA 2023 Level-1 validation tasks under clean evaluation constraints. A frozen full Level-1 baseline achieved 29/53 correct answers, or 54.72%. A later recovery configuration with expanded orchestration achieved 27/53 correct answers, or 50.94%, while increasing tracebacks, timeout events, tool-failure mentions, token-log calls, and campaign-log cost estimates. Two randomized 20-task smoke…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.