AFlow: Automating Agentic Workflow Generation
Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu, Fengwei Teng, Xionghui Chen,, Jiaqi Chen, Mingchen Zhuge, Xin Cheng, Sirui Hong, Jinlin Wang, Bingnan, Zheng, Bang Liu, Yuyu Luo, Chenglin Wu

TL;DR
AFlow is an automated framework that uses Monte Carlo Tree Search to generate and optimize agentic workflows with minimal human intervention, improving performance and reducing costs across multiple benchmarks.
Contribution
We reformulate workflow optimization as a search problem and introduce AFlow, which automates workflow generation using code modifications and feedback, outperforming existing methods.
Findings
AFlow achieves a 5.7% average improvement over baselines.
Smaller models outperform GPT-4o on certain tasks at 4.55% of its inference cost.
Empirical evaluation across six datasets demonstrates AFlow's effectiveness.
Abstract
Large language models (LLMs) have demonstrated remarkable potential in solving complex tasks across diverse domains, typically by employing agentic workflows that follow detailed instructions and operational sequences. However, constructing these workflows requires significant human effort, limiting scalability and generalizability. Recent research has sought to automate the generation and optimization of these workflows, but existing methods still rely on initial manual setup and fall short of achieving fully automated and effective workflow generation. To address this challenge, we reformulate workflow optimization as a search problem over code-represented workflows, where LLM-invoking nodes are connected by edges. We introduce AFlow, an automated framework that efficiently explores this space using Monte Carlo Tree Search, iteratively refining workflows through code modification,…
Peer Reviews
Decision·ICLR 2025 Oral
Overall, this paper is clear, well-motivated and provides a new framework for automatic workflow optimization, which has significant potential impact on agent design and workflow optimization for the broader machine learning community. It proposes a novel, original approach to model the workflow as a sequence of LLM-invoking nodes in a graph structure, with prompts, operators, and code-represented edges in the search space. By leveraging MCTS, the paper reaches SOTA performance on major workflow
The paper could benefit from discussions with regards to the following points: 1. To reduce the search space, the paper focuses on custom prompts, operators and code-represented edges by fixing parameters such as model choice, temperature and output format - which is a sound choice. Could there be more discussion on the potential effect of these parameters on model performance? 2. The authors mention some of the parameters used in MCTS in the appendix (e.g. $\lambda = 0.4$ used to balance explor
Novel Approach: AFLOW’s integration of MCTS with code-represented workflows introduces a new direction in automating LLM workflows. This reduces the reliance on manual design and allows efficient workflow discovery and optimization. Comprehensive Problem Formulation: The paper formalizes workflow optimization with a general mathematical framework, effectively unifying prior approaches and broadening the potential for future applications. Detailed Critique of Prior Work: The authors present an
Limited Scope and Generalizability: The paper primarily demonstrates AFLOW on benchmark tasks with clear success metrics, which raises questions about its applicability to more open-ended tasks, such as document generation or creative exploration. There is limited discussion on how the standardized prompts used in AFLOW would generalize to tasks without clear success criteria. The current prompts seem tailored to test-taking scenarios and may lack the flexibility required for tasks that demand c
The methodology is interesting, and the experimental setup is convincing in showing that AFLow enables smaller models to achieve superior performance to larger models. This lifts the cost/accuracy Pareto front. Given these results I am appreciative of this work.
While the methodology and results are nice, I do however believe that the presentation of the paper requires improvement. The description of the AFLow methodology lacks precision and can at times even be called handwavy (details below), which makes the paper hard to read. In case authors are able to address those issues I may be willing to increase my score. Some concrete examples (more examples in the questions): - There is a tree structure involved in the MCTS search process, but there is also
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Business Process Modeling and Analysis · Simulation Techniques and Applications
