Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision
Tomoya Yoshida, Shuhei Kurita, Taichi Nishimura, Shinsuke Mori

TL;DR
This paper introduces a framework that leverages large-scale egocentric and exocentric video datasets to generate 6DoF object manipulation trajectories from textual action descriptions, addressing data scarcity in training interactive robots.
Contribution
It presents a novel approach to generate manipulation trajectories from action descriptions using large-scale video datasets and language models, establishing a new task and baseline.
Findings
Models successfully generate valid object trajectories.
Created a new dataset for 6DoF trajectory generation.
Established baseline models for the task.
Abstract
Learning to use tools or objects in common scenes, particularly handling them in various ways as instructed, is a key challenge for developing interactive robots. Training models to generate such manipulation trajectories requires a large and diverse collection of detailed manipulation demonstrations for various objects, which is nearly unfeasible to gather at scale. In this paper, we propose a framework that leverages large-scale ego- and exo-centric video datasets -- constructed globally with substantial effort -- of Exo-Ego4D to extract diverse manipulation trajectories at scale. From these extracted trajectories with the associated textual action description, we develop trajectory generation models based on visual and point cloud-based language models. In the recently proposed egocentric vision-based in-a-quality trajectory dataset of HOT3D, we confirmed that our models successfully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Human Motion and Animation
