Predictive Inverse Dynamics Models are Scalable Learners for Robotic   Manipulation

Yang Tian; Sizhe Yang; Jia Zeng; Ping Wang; Dahua Lin; Hao Dong,; Jiangmiao Pang

arXiv:2412.15109·cs.RO·December 20, 2024

Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation

Yang Tian, Sizhe Yang, Jia Zeng, Ping Wang, Dahua Lin, Hao Dong,, Jiangmiao Pang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Seer, a scalable end-to-end inverse dynamics model using Transformers that effectively integrates vision and action for robotic manipulation, achieving state-of-the-art results in simulation and real-world tasks.

Contribution

The paper presents a novel predictive inverse dynamics model, Seer, trained end-to-end with large-scale robotic datasets, demonstrating superior generalization and performance over previous methods.

Findings

01

Achieves 13% improvement on LIBERO-LONG benchmark

02

Sets new state-of-the-art on CALVIN ABC-D with 4.28 average length

03

Outperforms previous methods in real-world manipulation tasks

Abstract

Current efforts to learn scalable policies in robotic manipulation primarily fall into two categories: one focuses on "action," which involves behavior cloning from extensive collections of robotic data, while the other emphasizes "vision," enhancing model generalization by pre-training representations or generative models, also referred to as world models, using large-scale visual datasets. This paper presents an end-to-end paradigm that predicts actions using inverse dynamics models conditioned on the robot's forecasted visual states, named Predictive Inverse Dynamics Models (PIDM). By closing the loop between vision and action, the end-to-end PIDM can be a better scalable action learner. In practice, we use Transformers to process both visual states and actions, naming the model Seer. It is initially pre-trained on large-scale robotic datasets, such as DROID, and can be adapted to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

openrobotlab/seer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Robotic Mechanisms and Dynamics · Teleoperation and Haptic Systems