Audiovisual Saliency Prediction in Uncategorized Video Sequences based on Audio-Video Correlation
Maryam Qamar Butt, Anis Ur Rahman

TL;DR
This paper proposes a novel audiovisual saliency model that combines audio and visual cues to improve saliency prediction in uncategorized videos, outperforming existing visual-only models.
Contribution
It introduces a generic audio/video saliency model that synchronizes low-level features, enhancing saliency prediction for natural videos with audio.
Findings
Model outperforms two state-of-the-art visual saliency models
Evaluated on DIEM dataset with improved accuracy
Integrates audio-visual cues for better saliency prediction
Abstract
Substantial research has been done in saliency modeling to develop intelligent machines that can perceive and interpret their surroundings. But existing models treat videos as merely image sequences excluding any audio information, unable to cope with inherently varying content. Based on the hypothesis that an audiovisual saliency model will be an improvement over traditional saliency models for natural uncategorized videos, this work aims to provide a generic audio/video saliency model augmenting a visual saliency map with an audio saliency map computed by synchronizing low-level audio and visual features. The proposed model was evaluated using different criteria against eye fixations data for a publicly available DIEM video dataset. The results show that the model outperformed two state-of-the-art visual saliency models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Olfactory and Sensory Function Studies · Multisensory perception and integration
