Agent Capsules: Quality-Gated Granularity Control for Multi-Agent LLM Pipelines
Aninda Ray

TL;DR
Agent Capsules is an adaptive runtime system that optimizes multi-agent LLM pipelines by balancing token efficiency and output quality through dynamic mode switching and empirical quality constraints.
Contribution
It introduces a novel framework that treats multi-agent pipeline execution as an optimization problem, achieving significant token savings while maintaining quality without extensive manual tuning.
Findings
Uses 51% fewer tokens in a 14-agent pipeline at similar quality.
Achieves 19% fewer tokens than uncompiled DSPy at quality parity.
Matches hand-tuned and compile-time baselines without training data.
Abstract
A multi-agent pipeline with N agents typically issues N LLM calls per run. Merging agents into fewer calls (compound execution) promises token savings, but naively merged calls silently degrade quality through tool loss and prompt compression. We present Agent Capsules, an adaptive execution runtime that treats multi-agent pipeline execution as an optimization problem with empirical quality constraints. The runtime instruments coordination overhead per group, scores composition opportunity, selects among three compound execution strategies, and gates every mode switch on rolling-mean output quality. A controlled negative result confirms that injecting more context into a merged call worsens compression rather than relieving it, so the framework's escalation ladder (standard, then two-phase, then sequential) recovers quality by moving toward per-agent dispatch rather than by rewriting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
