Can AI Models Direct Each Other? Organizational Structure as a Probe into Training Limitations
Rui Liu

TL;DR
This paper investigates the effectiveness and limitations of using a hierarchical multi-agent AI system, with a manager directing a worker, for software engineering tasks, highlighting the importance of organizational structure and training gaps.
Contribution
It introduces a two-agent pipeline with a manager and worker, analyzing how organizational structure impacts performance and revealing training gaps in current models.
Findings
A strong manager directing a weak worker matches a strong single agent in performance.
A weak manager directing a weak worker performs worse than a weak agent alone.
Active direction and structured exploration significantly improve task success.
Abstract
Can an expensive AI model effectively direct a cheap one to solve software engineering tasks? We study this question by introducing ManagerWorker, a two-agent pipeline where an expensive "manager" model (text-only, no code execution) analyzes issues, dispatches exploration tasks, and reviews implementations, while a cheap "worker" model (with full repo access) executes code changes. We evaluate on 200 instances from SWE-bench Lite across five configurations that vary the manager-worker relationship, pipeline complexity, and model pairing. Our findings reveal both the promise and the limits of multi-agent direction: (1) a strong manager directing a weak worker (62%) matches a strong single agent (60%) at a fraction of the strong-model token usage, showing that expensive reasoning can substitute for expensive execution; (2) a weak manager directing a weak worker (42%) performs worse than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
