Move2Hear: Active Audio-Visual Source Separation
Sagnik Majumder, Ziad Al-Halah, Kristen Grauman

TL;DR
This paper presents Move2Hear, a reinforcement learning-based system enabling an agent to move intelligently in 3D environments to improve audio source separation by actively positioning its sensors.
Contribution
It introduces the active audio-visual source separation problem and a reinforcement learning approach to optimize agent movements for better sound isolation.
Findings
The model effectively finds movement sequences that enhance audio separation.
It performs well in both augmented reality and mobile robotics scenarios.
State-of-the-art simulation results demonstrate its capability.
Abstract
We introduce the active audio-visual source separation problem, where an agent must move intelligently in order to better isolate the sounds coming from an object of interest in its environment. The agent hears multiple audio sources simultaneously (e.g., a person speaking down the hall in a noisy household) and it must use its eyes and ears to automatically separate out the sounds originating from a target object within a limited time budget. Towards this goal, we introduce a reinforcement learning approach that trains movement policies controlling the agent's camera and microphone placement over time, guided by the improvement in predicted audio separation quality. We demonstrate our approach in scenarios motivated by both augmented reality (system is already co-located with the target object) and mobile robotics (agent begins arbitrarily far from the target object). Using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
