TL;DR
This paper introduces MSDP, a self-supervised multisensory pretraining framework that enhances contact-rich robotic manipulation by improving sensory integration, robustness, and sample efficiency in reinforcement learning.
Contribution
MSDP is a novel pretraining method using masked autoencoding and cross-modal prediction to learn expressive multisensory representations for robot control.
Findings
MSDP accelerates learning in contact-rich tasks.
The approach is robust to sensor noise and dynamic changes.
High success rates achieved with minimal online interactions.
Abstract
Effective contact-rich manipulation requires robots to synergistically leverage vision, force, and proprioception. However, Reinforcement Learning agents struggle to learn in such multisensory settings, especially amidst sensory noise and dynamic changes. We propose MultiSensory Dynamic Pretraining (MSDP), a novel framework for learning expressive multisensory representations tailored for task-oriented policy learning. MSDP is based on masked autoencoding and trains a transformer-based encoder by reconstructing multisensory observations from only a subset of sensor embeddings, leading to cross-modal prediction and sensor fusion. For downstream policy learning, we introduce a novel asymmetric architecture, where a cross-attention mechanism allows the critic to extract dynamic, task-specific features from the frozen embeddings, while the actor receives a stable pooled representation to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
