Audio-Visual Collaborative Representation Learning for Dynamic Saliency Prediction
Hailong Ning, Bin Zhao, Zhanxuan Hu, Lang He, and Ercheng Pei

TL;DR
This paper introduces an audio-visual collaborative learning framework for dynamic saliency prediction, leveraging audio cues alongside visual data to enhance scene understanding and improve prediction accuracy.
Contribution
It proposes a novel multi-component method that encodes, locates, and integrates audio-visual information for better dynamic saliency prediction.
Findings
Outperforms existing DSP models on six challenging datasets.
Effectively locates sound sources within visual scenes.
Enhances saliency prediction accuracy by leveraging audio cues.
Abstract
The Dynamic Saliency Prediction (DSP) task simulates the human selective attention mechanism to perceive the dynamic scene, which is significant and imperative in many vision tasks. Most of existing methods only consider visual cues, while neglect the accompanied audio information, which can provide complementary information for the scene understanding. In fact, there exists a strong relation between auditory and visual cues, and humans generally perceive the surrounding scene by collaboratively sensing these cues. Motivated by this, an audio-visual collaborative representation learning method is proposed for the DSP task, which explores the audio modality to better predict the dynamic saliency map by assisting vision modality. The proposed method consists of three parts: 1) audio-visual encoding, 2) audio-visual location, and 3) collaborative integration parts. Firstly, a refined…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Multisensory perception and integration · Olfactory and Sensory Function Studies
