Disentangled Robot Learning via Separate Forward and Inverse Dynamics Pretraining

Wenyao Zhang; Bozhou Zhang; Zekun Qi; Wenjun Zeng; Xin Jin; Li Zhang

arXiv:2604.16391·cs.RO·April 21, 2026

Disentangled Robot Learning via Separate Forward and Inverse Dynamics Pretraining

Wenyao Zhang, Bozhou Zhang, Zekun Qi, Wenjun Zeng, Xin Jin, Li Zhang

PDF

1 Repo 1 Video

TL;DR

DeFI introduces a disentangled pretraining framework for robot learning, separating forward and inverse dynamics to leverage diverse data sources, leading to improved performance on various benchmarks.

Contribution

The paper proposes a novel decoupled pretraining approach with separate models for forward and inverse dynamics, enhancing robot learning from large-scale, action-free videos.

Findings

01

Achieved state-of-the-art results on CALVIN ABC-D with an average task length of 4.51.

02

Attained 51.2% success rate on SimplerEnv-Fractal benchmark.

03

Reached 81.3% success rate in real-world deployment.

Abstract

Vision-language-action (VLA) models have shown great potential in building generalist robots, but still face a dilemma-misalignment of 2D image forecasting and 3D action prediction. Besides, such a vision-action entangled training manner limits model learning from large-scale, action-free web video data. To address these issues, we propose DeFI, a novel framework that Decouples visual Forward and Inverse dynamics pretraining to exploit respective data sources, wherein video generation and action prediction are disentangled. We introduce the General Forward Dynamics Model (GFDM), pretrained on diverse human and robot videos for future prediction, and the General Inverse Dynamics Model (GIDM), trained via self-supervised learning to infer latent actions from unlabeled video transitions. These models are then integrated into a unified architecture for end-to-end finetuning on downstream…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

logosroboticsgroup/DeFi
github

Videos

Disentangled Robot Learning via Separate Forward and Inverse Dynamics Pretraining· slideslive