Audio-Visual Collaborative Representation Learning for Dynamic Saliency   Prediction

Hailong Ning; Bin Zhao; Zhanxuan Hu; Lang He; and Ercheng Pei

arXiv:2109.08371·cs.CV·May 3, 2022

Audio-Visual Collaborative Representation Learning for Dynamic Saliency Prediction

Hailong Ning, Bin Zhao, Zhanxuan Hu, Lang He, and Ercheng Pei

PDF

Open Access

TL;DR

This paper introduces an audio-visual collaborative learning framework for dynamic saliency prediction, leveraging audio cues alongside visual data to enhance scene understanding and improve prediction accuracy.

Contribution

It proposes a novel multi-component method that encodes, locates, and integrates audio-visual information for better dynamic saliency prediction.

Findings

01

Outperforms existing DSP models on six challenging datasets.

02

Effectively locates sound sources within visual scenes.

03

Enhances saliency prediction accuracy by leveraging audio cues.

Abstract

The Dynamic Saliency Prediction (DSP) task simulates the human selective attention mechanism to perceive the dynamic scene, which is significant and imperative in many vision tasks. Most of existing methods only consider visual cues, while neglect the accompanied audio information, which can provide complementary information for the scene understanding. In fact, there exists a strong relation between auditory and visual cues, and humans generally perceive the surrounding scene by collaboratively sensing these cues. Motivated by this, an audio-visual collaborative representation learning method is proposed for the DSP task, which explores the audio modality to better predict the dynamic saliency map by assisting vision modality. The proposed method consists of three parts: 1) audio-visual encoding, 2) audio-visual location, and 3) collaborative integration parts. Firstly, a refined…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Multisensory perception and integration · Olfactory and Sensory Function Studies