Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP
Igor Bogdanov, Chung-Horng Lung, Thomas Kunz, Jie Gao, Adrian Taylor, Marzia Zaman

TL;DR
This study evaluates the design choices of compound LLM agents in adversarial POMDP environments, revealing that structured hierarchies and context engineering outperform complex deliberation strategies in cost-effectiveness and performance.
Contribution
It provides a controlled empirical analysis of LLM agent design trade-offs in adversarial POMDPs, highlighting the benefits of hierarchy and context engineering over deliberation.
Findings
Programmatic state abstraction improves mean return by up to 76%.
Hierarchical decomposition without deliberation yields the best performance.
Distributed deliberation tools can cause a destructive cascade, reducing performance.
Abstract
Deploying compound LLM agents in adversarial, partially observable sequential environments requires navigating several design dimensions: (1) what the agent sees, (2) how it reasons, and (3) how tasks are decomposed across components. Yet practitioners lack guidance on which design choices improve performance versus merely increase inference costs. We present a controlled study of compound LLM agent design in CybORG CAGE-2, a cyber defense environment modeled as a Partially Observable Markov Decision Process (POMDP). Reward is non-positive, so all configurations operate in a failure-mitigation mode. Our evaluation spans five model families, six models, and twelve configurations (3,475 episodes) with token-level cost accounting. We vary context representation (raw observations vs. a deterministic state-tracking layer with compressed history), deliberation (self-questioning,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
