MA-ROESL: Motion-aware Rapid Reward Optimization for Efficient Robot Skill Learning from Single Videos

Xianghui Wang; Xinming Zhang; Yanjun Chen; Xiaoyu Shen; Wei Zhang

arXiv:2505.08367·cs.RO·May 14, 2025

MA-ROESL: Motion-aware Rapid Reward Optimization for Efficient Robot Skill Learning from Single Videos

Xianghui Wang, Xinming Zhang, Yanjun Chen, Xiaoyu Shen, Wei Zhang

PDF

TL;DR

MA-ROESL introduces a motion-aware reward optimization framework that significantly improves training efficiency and skill reproduction in robot locomotion learning from single videos, addressing current bottlenecks in video-based robot training.

Contribution

It proposes a novel motion-aware frame selection method and a three-phase training pipeline, enhancing reward quality and training efficiency for robot skill learning from videos.

Findings

01

Enhanced training efficiency in simulated and real-world settings.

02

Faithful reproduction of locomotion skills from single videos.

03

Robust and scalable framework for robot skill learning.

Abstract

Vision-language models (VLMs) have demonstrated excellent high-level planning capabilities, enabling locomotion skill learning from video demonstrations without the need for meticulous human-level reward design. However, the improper frame sampling method and low training efficiency of current methods remain a critical bottleneck, resulting in substantial computational overhead and time costs. To address this limitation, we propose Motion-aware Rapid Reward Optimization for Efficient Robot Skill Learning from Single Videos (MA-ROESL). MA-ROESL integrates a motion-aware frame selection method to implicitly enhance the quality of VLM-generated reward functions. It further employs a hybrid three-phase training pipeline that improves training efficiency via rapid reward optimization and derives the final policy through online fine-tuning. Experimental results demonstrate that MA-ROESL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.