Hear The Flow: Optical Flow-Based Self-Supervised Visual Sound Source   Localization

Dennis Fedorishin; Deen Dayal Mohan; Bhavin Jawade; Srirangaraj; Setlur; Venu Govindaraju

arXiv:2211.03019·cs.CV·November 8, 2022

Hear The Flow: Optical Flow-Based Self-Supervised Visual Sound Source Localization

Dennis Fedorishin, Deen Dayal Mohan, Bhavin Jawade, Srirangaraj, Setlur, Venu Govindaraju

PDF

Open Access 1 Repo 2 Videos

TL;DR

This paper introduces a novel optical flow-based self-supervised method for localizing sound sources in videos, leveraging motion information to improve accuracy and achieve state-of-the-art results on standard datasets.

Contribution

It proposes using optical flow as a prior for sound source localization, significantly enhancing attention maps and localization performance without explicit annotations.

Findings

01

Achieves state-of-the-art results on Soundnet Flickr dataset.

02

Demonstrates that flow-based attention improves localization accuracy.

03

Validates effectiveness on VGG Sound Source dataset.

Abstract

Learning to localize the sound source in videos without explicit annotations is a novel area of audio-visual research. Existing work in this area focuses on creating attention maps to capture the correlation between the two modalities to localize the source of the sound. In a video, oftentimes, the objects exhibiting movement are the ones generating the sound. In this work, we capture this characteristic by modeling the optical flow in a video as a prior to better aid in localizing the sound source. We further demonstrate that the addition of flow-based attention substantially improves visual sound source localization. Finally, we benchmark our method on standard sound source localization datasets and achieve state-of-the-art performance on the Soundnet Flickr and VGG Sound Source datasets. Code: https://github.com/denfed/heartheflow.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

denfed/heartheflow
pytorchOfficial

Videos

Hear The Flow: Optical Flow-Based Self-Supervised Visual Sound Source Localization· youtube

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation

MethodsMax Pooling · Dense Connections · Dropout · Convolution · Softmax