Can AI Models Direct Each Other? Organizational Structure as a Probe into Training Limitations

Rui Liu

arXiv:2603.26458·cs.SE·March 30, 2026

Can AI Models Direct Each Other? Organizational Structure as a Probe into Training Limitations

Rui Liu

PDF

TL;DR

This paper investigates the effectiveness and limitations of using a hierarchical multi-agent AI system, with a manager directing a worker, for software engineering tasks, highlighting the importance of organizational structure and training gaps.

Contribution

It introduces a two-agent pipeline with a manager and worker, analyzing how organizational structure impacts performance and revealing training gaps in current models.

Findings

01

A strong manager directing a weak worker matches a strong single agent in performance.

02

A weak manager directing a weak worker performs worse than a weak agent alone.

03

Active direction and structured exploration significantly improve task success.

Abstract

Can an expensive AI model effectively direct a cheap one to solve software engineering tasks? We study this question by introducing ManagerWorker, a two-agent pipeline where an expensive "manager" model (text-only, no code execution) analyzes issues, dispatches exploration tasks, and reviews implementations, while a cheap "worker" model (with full repo access) executes code changes. We evaluate on 200 instances from SWE-bench Lite across five configurations that vary the manager-worker relationship, pipeline complexity, and model pairing. Our findings reveal both the promise and the limits of multi-agent direction: (1) a strong manager directing a weak worker (62%) matches a strong single agent (60%) at a fraction of the strong-model token usage, showing that expensive reasoning can substitute for expensive execution; (2) a weak manager directing a weak worker (42%) performs worse than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.