Self-Improving Loops for Visual Robotic Planning
Calvin Luo, Zilai Zeng, Mingxi Jia, Yilun Du, Chen Sun

TL;DR
This paper introduces SILVR, a self-improving loop for visual robotic planning that iteratively enhances a video model's performance on robotic tasks through self-collected data, improving generalization and efficiency.
Contribution
The paper presents SILVR, a novel online self-improvement framework for visual robotic planning that updates models using self-produced trajectories without requiring ground-truth rewards or expert data.
Findings
Performance improves over iterations on unseen tasks.
Robustness without ground-truth rewards or expert demonstrations.
Outperforms alternative online experience methods.
Abstract
Video generative models trained on expert demonstrations have been utilized as performant text-conditioned visual planners for solving robotic tasks. However, generalization to unseen tasks remains a challenge. Whereas improved generalization may be facilitated by leveraging learned prior knowledge from additional pre-collected offline data sources, such as web-scale video datasets, in the era of experience we aim to design agents that can continuously improve in an online manner from self-collected behaviors. In this work we thus propose the Self-Improving Loops for Visual Robotic Planning (SILVR), where an in-domain video model iteratively updates itself on self-produced trajectories, and steadily improves its performance for a specified task of interest. We apply SILVR to a diverse suite of MetaWorld tasks, as well as two manipulation tasks on a real robot arm, and find that…
Peer Reviews
Decision·ICLR 2026 Poster
- Using online interaction to improve visual planning is a novel and interesting research problem. - The approach is intuitive and performs well in practice. - Thorough experiments in both simulation and on real robots. The training tasks and test tasks are separated clearly, allowing a strict assessment of how well the method adapts to unseen tasks.
- I treat the method as two complementary parts: (a) improve the visual planner by successful trajectories rolled out with the help of a good IDM, and (b) improve the IDM with meaningful online behavior collected under the guidance of a good visual planner. Though the paper is mostly presented from perspective (a), it would be valuable to do some ablations to understand whether the empirical gains come mainly from (a), (b), or their combination. My feeling is that the answer may differ across se
(1) As far as I'm aware the idea of doing self-improvement of a high-level video generation policy in the robotic setting has not been explored yet, making this work novel in this regard. (2) The performance of the approach is shown to be better than two relevant self-improvement baselines, DSRL and regular filter-BC, which is a promising result. (3) The authors ablate many components of their approach, making the experimentation fairly thorough
(1) While self-improvement is obtained, success rate on both MetaWorld and the real-world seems to cap out at around 60 to 70%, and more self-improvement iterations do not improve performance. Intuitively it would seem that there should be no cap on performance. Why does the approach not seem to be able to improve past this success rate? (2) The authors do not make a convincing argument about why filter BC of a video generative high-level policy should be better than filter-BC of a regular sing
1. The proposed SILVR is straightforward and easy to follow, leveraging iterative fine-tuning of a visual planner on self-collected experience without complex reward engineering. 2. The paper provides extensive experiments across both simulated (MetaWorld) and real-world (Franka Panda arm) settings, demonstrating broad applicability and robustness. 3. SILVR shows significant performance improvement on unseen tasks (up to 285% in MetaWorld) and outperforms reinforcement learning and behavior clon
Since that I am not an expert in this field, here are a few potential weaknesses I observed from the paper: 1. While the method is well-executed and effective, the core idea of a self-improving loop—fine-tuning a model on its own successful outputs—is a well-established concept in other areas like large language models. Applying this established concept to visual planning, while practical, may be perceived as a solid incremental advancement rather than a foundational shift in methodology. 2. Th
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics
