MA-ROESL: Motion-aware Rapid Reward Optimization for Efficient Robot Skill Learning from Single Videos
Xianghui Wang, Xinming Zhang, Yanjun Chen, Xiaoyu Shen, Wei Zhang

TL;DR
MA-ROESL introduces a motion-aware reward optimization framework that significantly improves training efficiency and skill reproduction in robot locomotion learning from single videos, addressing current bottlenecks in video-based robot training.
Contribution
It proposes a novel motion-aware frame selection method and a three-phase training pipeline, enhancing reward quality and training efficiency for robot skill learning from videos.
Findings
Enhanced training efficiency in simulated and real-world settings.
Faithful reproduction of locomotion skills from single videos.
Robust and scalable framework for robot skill learning.
Abstract
Vision-language models (VLMs) have demonstrated excellent high-level planning capabilities, enabling locomotion skill learning from video demonstrations without the need for meticulous human-level reward design. However, the improper frame sampling method and low training efficiency of current methods remain a critical bottleneck, resulting in substantial computational overhead and time costs. To address this limitation, we propose Motion-aware Rapid Reward Optimization for Efficient Robot Skill Learning from Single Videos (MA-ROESL). MA-ROESL integrates a motion-aware frame selection method to implicitly enhance the quality of VLM-generated reward functions. It further employs a hybrid three-phase training pipeline that improves training efficiency via rapid reward optimization and derives the final policy through online fine-tuning. Experimental results demonstrate that MA-ROESL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
