Multi-step manipulation task and motion planning guided by video demonstration

Kateryna Zorina; David Kovar; Mederic Fourmy; Florent Lamiraux; Nicolas Mansard; Justin Carpentier; Josef Sivic; Vladimir Petrik

arXiv:2505.08949·cs.RO·May 15, 2025

Multi-step manipulation task and motion planning guided by video demonstration

Kateryna Zorina, David Kovar, Mederic Fourmy, Florent Lamiraux, Nicolas Mansard, Justin Carpentier, Josef Sivic, Vladimir Petrik

PDF

Open Access

TL;DR

This paper introduces a video-guided multi-step task and motion planning method using an extended RRT algorithm, demonstrating its effectiveness on complex robotic tasks and real robots with a trajectory refinement approach.

Contribution

It extends RRT planning with video-extracted contact states and object poses, enabling complex multi-step tasks with scene generalization in robotics.

Findings

01

Effective planning for complex multi-step tasks demonstrated.

02

Successful application on multiple robot platforms.

03

Trajectory refinement improves real-world execution.

Abstract

This work aims to leverage instructional video to solve complex multi-step task-and-motion planning tasks in robotics. Towards this goal, we propose an extension of the well-established Rapidly-Exploring Random Tree (RRT) planner, which simultaneously grows multiple trees around grasp and release states extracted from the guiding video. Our key novelty lies in combining contact states and 3D object poses extracted from the guiding video with a traditional planning algorithm that allows us to solve tasks with sequential dependencies, for example, if an object needs to be placed at a specific location to be grasped later. We also investigate the generalization capabilities of our approach to go beyond the scene depicted in the instructional video. To demonstrate the benefits of the proposed video-guided planning approach, we design a new benchmark with three challenging tasks: (I) 3D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTeleoperation and Haptic Systems · Human Motion and Animation · Advanced Vision and Imaging