Move2Hear: Active Audio-Visual Source Separation

Sagnik Majumder; Ziad Al-Halah; Kristen Grauman

arXiv:2105.07142·cs.CV·August 27, 2021

Move2Hear: Active Audio-Visual Source Separation

Sagnik Majumder, Ziad Al-Halah, Kristen Grauman

PDF

TL;DR

This paper presents Move2Hear, a reinforcement learning-based system enabling an agent to move intelligently in 3D environments to improve audio source separation by actively positioning its sensors.

Contribution

It introduces the active audio-visual source separation problem and a reinforcement learning approach to optimize agent movements for better sound isolation.

Findings

01

The model effectively finds movement sequences that enhance audio separation.

02

It performs well in both augmented reality and mobile robotics scenarios.

03

State-of-the-art simulation results demonstrate its capability.

Abstract

We introduce the active audio-visual source separation problem, where an agent must move intelligently in order to better isolate the sounds coming from an object of interest in its environment. The agent hears multiple audio sources simultaneously (e.g., a person speaking down the hall in a noisy household) and it must use its eyes and ears to automatically separate out the sounds originating from a target object within a limited time budget. Towards this goal, we introduce a reinforcement learning approach that trains movement policies controlling the agent's camera and microphone placement over time, guided by the improvement in predicted audio separation quality. We demonstrate our approach in scenarios motivated by both augmented reality (system is already co-located with the target object) and mobile robotics (agent begins arbitrarily far from the target object). Using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.