DAVE: A Deep Audio-Visual Embedding for Dynamic Saliency Prediction

Hamed R. Tavakoli; Ali Borji; Esa Rahtu; Juho Kannala

arXiv:1905.10693·cs.CV·January 9, 2020·29 cites

DAVE: A Deep Audio-Visual Embedding for Dynamic Saliency Prediction

Hamed R. Tavakoli, Ali Borji, Esa Rahtu, Juho Kannala

PDF

Open Access 2 Repos

TL;DR

This paper introduces DAVE, a simple deep learning model that integrates audio and visual cues to improve dynamic saliency prediction in videos, demonstrating audio's significant contribution to gaze prediction accuracy.

Contribution

The paper presents a novel, simple deep audio-visual embedding model for dynamic saliency prediction and provides an extensive analysis of audio's role in enhancing saliency models.

Findings

01

Audio significantly improves saliency prediction accuracy.

02

Salient sound sources naturally attract visual attention.

03

Audio-visual model outperforms visual-only models on over 53% of frames.

Abstract

This paper studies audio-visual deep saliency prediction. It introduces a conceptually simple and effective Deep Audio-Visual Embedding for dynamic saliency prediction dubbed ``DAVE" in conjunction with our efforts towards building an Audio-Visual Eye-tracking corpus named ``AVE". Despite existing a strong relation between auditory and visual cues for guiding gaze during perception, video saliency models only consider visual cues and neglect the auditory information that is ubiquitous in dynamic scenes. Here, we investigate the applicability of audio cues in conjunction with visual ones in predicting saliency maps using deep neural networks. To this end, the proposed model is intentionally designed to be simple. Two baseline models are developed on the same architecture which consists of an encoder-decoder. The encoder projects the input into a feature space followed by a decoder that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Multisensory perception and integration · Olfactory and Sensory Function Studies