Self-Optimizing Multi-Agent Systems for Deep Research

Arthur C\^amara; Vincent Slot; Jakub Zavrel

arXiv:2604.02988·cs.IR·April 6, 2026

Self-Optimizing Multi-Agent Systems for Deep Research

Arthur C\^amara, Vincent Slot, Jakub Zavrel

PDF

TL;DR

This paper introduces self-optimizing multi-agent systems for deep research, where agents improve their performance through self-play and exploration, reducing reliance on static prompts and enhancing answer quality.

Contribution

It demonstrates that enabling agents to self-play and explore prompt variations leads to high-quality research systems that outperform static, hand-engineered prompt architectures.

Findings

01

Self-play and exploration improve system performance.

02

Agents can match or outperform expert-crafted prompts.

03

Self-optimization reduces need for manual prompt engineering.

Abstract

Given a user's complex information need, a multi-agent Deep Research system iteratively plans, retrieves, and synthesizes evidence across hundreds of documents to produce a high-quality answer. In one possible architecture, an orchestrator agent coordinates the process, while parallel worker agents execute tasks. Current Deep Research systems, however, often rely on hand-engineered prompts and static architectures, making improvement brittle, expensive, and time-consuming. We therefore explore various multi-agent optimization methods to show that enabling agents to self-play and explore different prompt combinations can produce high-quality Deep Research systems that match or outperform expert-crafted prompts.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.