SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents

Wanxin Tian; Shijie Zhang; Kevin Zhang; Xiaowei Chi; Chunkai Fan; Junyu Lu; Yulin Luo; Qiang Zhou; Yiming Zhao; Ning Liu; Siyu Lin; Zhiyuan Qin; Xiaozhu Ju; Shanghang Zhang; Jian Tang

arXiv:2506.21669·cs.AI·October 28, 2025

SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents

Wanxin Tian, Shijie Zhang, Kevin Zhang, Xiaowei Chi, Chunkai Fan, Junyu Lu, Yulin Luo, Qiang Zhou, Yiming Zhao, Ning Liu, Siyu Lin, Zhiyuan Qin, Xiaozhu Ju, Shanghang Zhang, Jian Tang

PDF

TL;DR

SEEA-R1 introduces a novel reinforcement fine-tuning framework that enables self-evolving embodied agents through tree-structured reasoning and multimodal reward modeling, significantly improving performance on complex real-world tasks.

Contribution

It proposes Tree-GRPO and MGRM to enhance multi-step reasoning and reward generalization, facilitating autonomous self-evolution in embodied agents.

Findings

01

Achieved state-of-the-art scores on ALFWorld benchmark

02

Outperformed prior models including GPT-4o in multi-modal tasks

03

Demonstrated scalability and robustness without ground truth rewards

Abstract

Self-evolution, the ability of agents to autonomously improve their reasoning and behavior, is essential for the embodied domain with long-horizon, real-world tasks. Despite current advancements in reinforcement fine-tuning (RFT) showing strong performance in enhancing reasoning in LLMs, its potential to enable self-evolving embodied intelligence with multi-modal interactions remains largely unexplored. Specifically, reinforcement fine-tuning faces two fundamental obstacles in embodied settings: (i) the lack of accessible intermediate rewards in multi-step reasoning tasks limits effective learning signals, and (ii) reliance on hand-crafted reward functions restricts generalization to novel tasks and environments. To address these challenges, we present Self-Evolving Embodied Agents-R1, SEEA-R1, the first RFT framework designed for enabling the self-evolving capabilities of embodied…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.