Play it by Ear: Learning Skills amidst Occlusion through Audio-Visual Imitation Learning
Maximilian Du, Olivia Y. Lee, Suraj Nair, Chelsea Finn

TL;DR
This paper introduces a system that combines visual and audio data for robotic manipulation tasks, demonstrating improved success rates in partially-observed scenarios through imitation learning and human interventions.
Contribution
It presents a novel approach integrating audio feedback with visual data for imitation learning in robotic manipulation, especially under occlusion conditions.
Findings
Audio improves success in simulated tasks.
Online interventions boost imitation learning success by ~20%.
Achieved 70% success rate on real robot tasks, outperforming audio-less policies.
Abstract
Humans are capable of completing a range of challenging manipulation tasks that require reasoning jointly over modalities such as vision, touch, and sound. Moreover, many such tasks are partially-observed; for example, taking a notebook out of a backpack will lead to visual occlusion and require reasoning over the history of audio or tactile information. While robust tactile sensing can be costly to capture on robots, microphones near or on a robot's gripper are a cheap and easy way to acquire audio feedback of contact events, which can be a surprisingly valuable data source for perception in the absence of vision. Motivated by the potential for sound to mitigate visual occlusion, we aim to learn a set of challenging partially-observed manipulation tasks from visual and audio inputs. Our proposed system learns these tasks by combining offline imitation learning from a modest number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTactile and Sensory Interactions · Music Technology and Sound Studies · Robot Manipulation and Learning
