SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
Zeyi Sun, Ziyu Liu, Yuhang Zang, Yuhang Cao, Xiaoyi Dong, Tong Wu, Dahua Lin, Jiaqi Wang

TL;DR
SEAgent is a self-evolving agent framework that enables computer use agents to autonomously learn and adapt to new software environments through experiential learning, significantly improving success rates without human annotations.
Contribution
The paper introduces SEAgent, a novel autonomous learning framework with a curriculum generator and experiential policy updates, enabling CUAs to master unfamiliar software environments.
Findings
Achieved a 23.2% increase in success rate over baseline.
Validated on five novel software environments within OS-World.
Developed a specialist-to-generalist training strategy for continuous evolution.
Abstract
Repurposing large vision-language models (LVLMs) as computer use agents (CUAs) has led to substantial breakthroughs, primarily driven by human-labeled data. However, these models often struggle with novel and specialized software, particularly in scenarios lacking human annotations. To address this challenge, we propose SEAgent, an agentic self-evolving framework enabling CUAs to autonomously evolve through interactions with unfamiliar software. Specifically, SEAgent empowers computer-use agents to autonomously master novel software environments via experiential learning, where agents explore new software, learn through iterative trial-and-error, and progressively tackle auto-generated tasks organized from simple to complex. To achieve this goal, we design a World State Model for step-wise trajectory assessment, along with a Curriculum Generator that generates increasingly diverse and…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
The method effectively reduces reliance on human-labeled data by allowing agents to learn from trial-and-error interactions in unfamiliar software environments.
1. If Large Vision-Language Models (LVLMs) can serve as World State Models, it means that LVLMs contain all the information for the application. So why not directly use LVLMs for decision-making? This completely contradicts the "without human intervention" description in Figure 1's caption and the motivation of this paper. 2. The World State Model relies on GPT-4o annotations, which may introduce bias and affect reproducibility.
- The lower part of Figure 1 presents the specialist-to-generalist training strategy with great clarity, and this strategy also provides valuable inspiration for progress in other domains. In general, each domain requires models to possess multi-dimensional capabilities, and the visualization and the demonstrated effectiveness of the proposed strategy in addressing this issue are particularly insightful and valuable. - The paper provides a very clear introduction to the background of the task. A
- The difference between the specialist-to-generalist training strategy proposed in this paper and previous similar strategies requires further clarification. - The readability of Section 3.1 and Figure 2 is not very good, and Figure 2 is not cited anywhere in the paper. - There is a citation error at line 235.
### 1. **Innovation** - **Self-evolving framework**: **SEAgent**, a **self-evolving framework** for autonomous exploration and experiential learning, is innovative in CUAs fields. It allows agents to autonomously generate tasks and assess their success/failure in previously unfamiliar software environments without human intervention, advancing the capabilities of autonomous systems. ### 2. **Task Generation and Evaluation Precision** - **World State Model**: The paper introduces the **Wo
1. **Inconsistency in the Paper's Claims** The paper's claims are not fully self-consistent. Although the paper's title suggests **self-evolving** agents and emphasizes the exploration of LVLMs for autonomous exploration, the **World State Model** used for exploration in this study still relies on **human-annotated high-quality datasets**. This contradicts the starting point outlined in **line 53**, which advocates for agents to evolve without such dependencies. The paper should clarify thi
1. This work identifies several key elements contributing to the development of autonumous agents for computer use without relying on annotated data, providing insights for practical implementation of such agents. The key elements include: (1) a new task generation strategy using a Curriculum Generator; (2) a new reward model to assess both the step-level actions and trajectory-level success using a World State Model; (3) a new training objective combining both adversarial imitation of failur
1. The writing of the paper is too poorly, requiring many efforts to revise and polish. (1) Too many grammatical errors, making it hard to read. (2) Captions of most Tables and Figures are unclear, lacking explanations to details in the Table or Figure. For example, the meaning of different colors of the curves in Figure 4. (3) The experimental setup is missing in the main paper, including benchmarks, baselines, evaluation metrics, and implementation details, making it hard to make a fair c
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
