DAVE: A Deep Audio-Visual Embedding for Dynamic Saliency Prediction
Hamed R. Tavakoli, Ali Borji, Esa Rahtu, Juho Kannala

TL;DR
This paper introduces DAVE, a simple deep learning model that integrates audio and visual cues to improve dynamic saliency prediction in videos, demonstrating audio's significant contribution to gaze prediction accuracy.
Contribution
The paper presents a novel, simple deep audio-visual embedding model for dynamic saliency prediction and provides an extensive analysis of audio's role in enhancing saliency models.
Findings
Audio significantly improves saliency prediction accuracy.
Salient sound sources naturally attract visual attention.
Audio-visual model outperforms visual-only models on over 53% of frames.
Abstract
This paper studies audio-visual deep saliency prediction. It introduces a conceptually simple and effective Deep Audio-Visual Embedding for dynamic saliency prediction dubbed ``DAVE" in conjunction with our efforts towards building an Audio-Visual Eye-tracking corpus named ``AVE". Despite existing a strong relation between auditory and visual cues for guiding gaze during perception, video saliency models only consider visual cues and neglect the auditory information that is ubiquitous in dynamic scenes. Here, we investigate the applicability of audio cues in conjunction with visual ones in predicting saliency maps using deep neural networks. To this end, the proposed model is intentionally designed to be simple. Two baseline models are developed on the same architecture which consists of an encoder-decoder. The encoder projects the input into a feature space followed by a decoder that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Multisensory perception and integration · Olfactory and Sensory Function Studies
