MOSAIC: Learning Unified Multi-Sensory Object Property Representations for Robot Learning via Interactive Perception
Gyan Tatiya, Jonathan Francis, Ho-Hsiang Wu, Yonatan Bisk, Jivko, Sinapov

TL;DR
MOSAIC is a framework that enables robots to learn unified representations of object properties across visual, auditory, and haptic modalities, improving multi-sensory perception and task performance.
Contribution
It introduces a novel multi-sensory learning framework that distills knowledge from foundation models and aligns representations across multiple sensory modalities for robotics.
Findings
MOSAIC achieves competitive object categorization accuracy.
The framework excels in zero-shot object-fetching tasks.
Unified representations improve multi-sensory perception in robots.
Abstract
A holistic understanding of object properties across diverse sensory modalities (e.g., visual, audio, and haptic) is essential for tasks ranging from object categorization to complex manipulation. Drawing inspiration from cognitive science studies that emphasize the significance of multi-sensory integration in human perception, we introduce MOSAIC (Multimodal Object property learning with Self-Attention and Interactive Comprehension), a novel framework designed to facilitate the learning of unified multi-sensory object property representations. While it is undeniable that visual information plays a prominent role, we acknowledge that many fundamental object properties extend beyond the visual domain to encompass attributes like texture, mass distribution, or sounds, which significantly influence how we interact with objects. In MOSAIC, we leverage this profound insight by distilling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Robot Manipulation and Learning
