EvoCurr: Self-evolving Curriculum with Behavior Code Generation for Complex Decision-making

Yang Cheng; Zilai Wang; Weiyu Ma; Wenhui Zhu; Yue Deng; Jian Zhao

arXiv:2508.09586·cs.AI·August 21, 2025

EvoCurr: Self-evolving Curriculum with Behavior Code Generation for Complex Decision-making

Yang Cheng, Zilai Wang, Weiyu Ma, Wenhui Zhu, Yue Deng, Jian Zhao

PDF

4 Reviews

TL;DR

EvoCurr introduces a self-evolving curriculum framework where an LLM generates increasingly difficult problem instances to improve decision-making skills in other LLMs, significantly boosting success rates on complex benchmarks.

Contribution

The paper presents a novel self-evolving curriculum method using LLMs to generate tailored problem sequences, enhancing reasoning in complex decision tasks.

Findings

01

Significant improvement in task success rates.

02

Enhanced solution efficiency over baselines.

03

Effective dynamic difficulty adjustment.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse domains, including programming, planning, and decision-making. However, their performance often degrades when faced with highly complex problem instances that require deep reasoning over long horizons. In such cases, direct problem-solving approaches can lead to inefficiency or failure due to the lack of structured intermediate guidance. To address this, we propose a novel self-evolve framework, EvoCurr, in which a dedicated curriculum-generation LLM constructs a sequence of problem instances with gradually increasing difficulty, tailored to the solver LLM's learning progress. The curriculum dynamically adapts easing challenges when the solver struggles and escalating them when success is consistent, thus maintaining an optimal learning trajectory. This approach enables the solver LLM, implemented as a…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 6Confidence 3

Strengths

- Inference-time curriculum with simple rules (accepted-floor + feasibility gate) removes the need for hand-crafted difficulty metrics and schedules. - The behavior trees and the syntax & code critic make debugging and analysis practical. - Compelling results on 12 SC2 tasks and Overcooked with clear acceptance criteria and curriculum traces.

Weaknesses

- The major concern is the limited baseline. To more clearly clarify the advantage of the proposed method, the authors are recommended to compare the proposed method to stronger curriculum RL baselines or strong search-based planners beyond the “one-shot direct code” baseline. - Directly using code as policy may limit the capacity. Behavior-tree size and LLM context length may cap complexity; ablations on tree depth/lines of code vs performance would be useful.

Reviewer 02Rating 2Confidence 4

Strengths

- The paper structure and presentation are clear and easy to follow. - Figures and tables are well-made, helping readers easily grasp the main idea.

Weaknesses

- The proposed approach isn't really new; similar frameworks have already been explored before. Even though the authors applied it to LLMs, it doesn't really introduce anything new. - The method relies too heavily on heuristic choices and manually set hyper-parameters (like the difficulty measure $d(C)$ and acceptance threshold $\tau$). This makes the method seem overly simplistic. - The experimental environments feel somewhat toy-like, raising questions about the generalizability of the results

Reviewer 03Rating 4Confidence 4

Strengths

- The paper is easy to follow and well-written. The figures explain EvoCurr's components clearly, and the experimental setup and results are well demonstrated too. - The designer-solver setup is a modular concept that connects curriculum learning to LLM-based solution generation. - Using two difficult domains as examples for open-loop and closed-loop settings shows that the framework is general in certain aspects. - Empirical evidence shows that EvoCurr has benefits over direct baselines.

Weaknesses

- Although described as 'self-evolving', the method is not so automated as the difficult levels are pre-defined and do not depend on agent capabilities, but rather heuristically picked properties. So I'd call EvoCurr semi-automated. - The proposed framework does not seem as novel and generalizable as it is described, as it is a heuristic adoption of existing ideas into two particular domains. For example, the enforced rules sound similar to return and task similarity constraints in self-paced le

Reviewer 04Rating 2Confidence 3

Strengths

1. The proposed framework can be applied to a wide range of other tasks. 2. Compared to the baseline proposed in the paper, significant improvements were achieved in both StarCraft and Overcooked tasks.

Weaknesses

1. Overall, the framework appears relatively simple, consisting solely of a designer and a solver. This makes the framework not entirely suitable for all tasks. The author needs to provide further evidence to demonstrate the applicability of this framework. 2. I have reservations about the novelty of the paper. This paper designs a curriculumlearning framework and validates its effectiveness in the LLM domain. I personally argue that this framework represents a simplification of current codi

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.