DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control

Zichen Jeff Cui; Hengkai Pan; Aadhithya Iyer; Siddhant Haldar; Lerrel; Pinto

arXiv:2409.12192·cs.RO·November 1, 2024

DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control

Zichen Jeff Cui, Hengkai Pan, Aadhithya Iyer, Siddhant Haldar, Lerrel, Pinto

PDF

Open Access 1 Models 1 Video

TL;DR

DynaMo introduces an in-domain self-supervised approach for learning visual representations from expert demonstrations, significantly enhancing imitation learning efficiency without relying on out-of-domain data or complex augmentations.

Contribution

DynaMo jointly learns latent inverse and forward dynamics models from in-domain data, improving visual representation quality for visuomotor control tasks.

Findings

01

DynaMo outperforms prior self-supervised methods in imitation learning tasks.

02

Representation quality improves across various policy architectures.

03

Ablation studies highlight key components contributing to performance gains.

Abstract

Imitation learning has proven to be a powerful tool for training complex visuomotor policies. However, current methods often require hundreds to thousands of expert demonstrations to handle high-dimensional visual observations. A key reason for this poor data efficiency is that visual representations are predominantly either pretrained on out-of-domain data or trained directly through a behavior cloning objective. In this work, we present DynaMo, a new in-domain, self-supervised method for learning visual representations. Given a set of expert demonstrations, we jointly learn a latent inverse dynamics model and a forward dynamics model over a sequence of image embeddings, predicting the next frame in latent space, without augmentations, contrastive sampling, or access to ground truth actions. Importantly, DynaMo does not require any out-of-domain data such as Internet datasets or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
jeffacce/dynamo_ssl
model

Videos

DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control· slideslive

Taxonomy

TopicsVirtual Reality Applications and Impacts · Human Motion and Animation · Advanced Vision and Imaging

MethodsAttention Is All You Need · Sparse Evolutionary Training · Linear Layer · Multi-Head Attention · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer