AgentSquare: Automatic LLM Agent Search in Modular Design Space
Yu Shang, Yu Li, Keyu Zhao, Likai Ma, Jiahe Liu, Fengli Xu, Yong Li

TL;DR
This paper introduces AgentSquare, a modular search framework for optimizing LLM-based agents through evolution and recombination within a unified design space, significantly improving performance across diverse tasks.
Contribution
It proposes a modular design space for LLM agents and a novel search framework with performance prediction, enabling automated optimization and insights into agent architecture.
Findings
Achieves an average performance gain of 17.2% over human-designed agents.
Outperforms existing hand-crafted agents across six benchmark scenarios.
Provides interpretable insights into agent design and performance.
Abstract
Recent advancements in Large Language Models (LLMs) have led to a rapid growth of agentic systems capable of handling a wide range of complex tasks. However, current research largely relies on manual, task-specific design, limiting their adaptability to novel tasks. In this paper, we introduce a new research problem: Modularized LLM Agent Search (MoLAS). We propose a modular design space that abstracts existing LLM agent designs into four fundamental modules with uniform IO interface: Planning, Reasoning, Tool Use, and Memory. Building on this design space, we present a novel LLM agent search framework called AgentSquare, which introduces two core mechanisms, i.e., module evolution and recombination, to efficiently search for optimized LLM agents. To further accelerate the process, we design a performance predictor that uses in-context surrogate models to skip unpromising agent designs.…
Peer Reviews
Decision·ICLR 2025 Poster
- Modularity and Reusability: The modular allows the reuse and recombination of components, which aligns well with LLM advancements in modularization and scalability. - Effective Search Mechanism: The combination of module evolution, recombination, and performance prediction seems to be a robust optimization strategy. The proposed performance predictor effectively reduces evaluation costs, addressing practical limitations in real-world deployments of LLM agents. - Comprehensive Evaluation: Be
My main issue with the paper is that while the modular approach is beneficial in the short-term, it may limit flexibility by enforcing predefined components. Extending the modular design to allow more dynamic, task-specific modules could enhance its applicability. Additionally, the framework’s reliance on LLM-driven suggestions for module evolution and recombination could inherit biases or inefficiencies from the LLM models themselves, potentially limiting the quality of novel configurations.
1.The paper is well-organized and uses language that is easy to understand. 2.The paper is well motivated, addressing an interesting research question. It innovatively consolidates existing (and potentially upcoming) LLM agent designs into a unified framework, and effectively leverages their successful experience for better design. 3.The experiments are conducted with both quantitative evaluations (including task performances, API costs and search trajectories), and qualitative analyses such
1.There are some inconsistencies within the paper that might cause confusion. For example, Equation (2) and (3) indicate that both module recombination and evolution take past experience as input, which is not the case in Figure 3. Besides, the random initialization (as mentioned in experimental setup) seems to contradict the arguments made at the beginning of Section 3.3. 2. It seems more fair to also take into account the API cost incurred by search when you compare the performance-cost trad
This method shows it's originality in using evolutionary and recombined mechanisms to automatically explore optimal combinations, effectively consolidating prior research efforts. It also provides empirical testing across diverse benchmarks, showing improvement over handcrafted and prior methods. The research extends to its broad applications and the potential to unify efforts within the LLM agent community, reducing reliance on task-specific human design part and enabling a more systematic expl
The paper lacks clarity regarding the definition of certain components of the method, particularly the performance evaluation function mentioned in Section 3.1. It is unclear whether this refers to the API cost introduced later or another evaluation metric. Additionally, the method shows limited novelty, as it primarily focuses on leveraging LLMs to recombine and select existing components rather than introducing a fundamentally new application or capability for LLMs. Although the proposed appro
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization · BIM and Construction Integration · Semantic Web and Ontologies
