Evolvable Embodied Agent for Robotic Manipulation via Long Short-Term Reflection and Optimization

Jianzong Wang; Botao Zhao; Yayun He; Junqing Peng; Xulong Zhang

arXiv:2604.13533·cs.RO·April 23, 2026

Evolvable Embodied Agent for Robotic Manipulation via Long Short-Term Reflection and Optimization

Jianzong Wang, Botao Zhao, Yayun He, Junqing Peng, Xulong Zhang

PDF

TL;DR

This paper introduces EEAgent, a self-evolving robotic framework using vision-language models and a reflection mechanism to improve adaptability and task success in complex environments.

Contribution

The paper presents a novel framework combining VLMs and a reflection-based optimization to enable continuous self-evolution of robots without extensive retraining.

Findings

01

Sets new state-of-the-art on six VIMA-Bench tasks.

02

Outperforms baselines in complex robotic scenarios.

03

Enhances task success rates through reflection-based prompt refinement.

Abstract

Achieving general-purpose robotics requires empowering robots to adapt and evolve based on their environment and feedback. Traditional methods face limitations such as extensive training requirements, difficulties in cross-task generalization, and lack of interpretability. Prompt learning offers new opportunities for self-evolving robots without extensive training, but simply reflecting on past experiences. However, extracting meaningful insights from task successes and failures remains a challenge. To this end, we propose the evolvable embodied agent (EEAgent) framework, which leverages large vision-language models (VLMs) for better environmental interpretation and policy planning. To enhance reflection on past experiences, we propose a long short-term reflective optimization (LSTRO) mechanism that dynamically refines prompts based on both past experiences and newly learned lessons,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.