TL;DR
This paper develops a multi-modal world model integrating visual and tactile data to improve robotic interaction predictions, especially under physical ambiguity, and introduces two new datasets for evaluation.
Contribution
It presents a novel visuo-tactile predictive system and two datasets, advancing understanding of physical interactions in robotics with multimodal data.
Findings
Visuo-tactile prediction enhances accuracy in ambiguous interactions.
Tactile data provides limited benefits when object dynamics are visually clear.
New datasets isolate physical ambiguity and mirror existing benchmarks.
Abstract
Predicting the outcomes of robotic actions, often referred to as learning a world model, in complex environments remains a fundamental challenge in robotics. Existing approaches primarily rely on visual observations and action inputs to generate video-based predictions, frequently overlooking the critical role of tactile feedback in understanding physical interactions. In this work, we investigate the integration of tactile and visual information within predictive perception systems for physical robot interaction. We demonstrate that visuo-tactile prediction provides the greatest benefits in physically ambiguous interaction regimes, while improvements are naturally limited when object dynamics are visually inferable. Furthermore, we introduce two novel robot-pushing datasets collected using a magnetic-based tactile sensor for unsupervised learning. The first dataset comprises visually…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
