Weakly Supervised Visual-Auditory Fixation Prediction with   Multigranularity Perception

Guotao Wang; Chenglizhao Chen; Deng-Ping Fan; Aimin Hao; and Hong Qin

arXiv:2112.13697·cs.CV·August 1, 2022

Weakly Supervised Visual-Auditory Fixation Prediction with Multigranularity Perception

Guotao Wang, Chenglizhao Chen, Deng-Ping Fan, Aimin Hao, and Hong Qin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a weakly supervised method for visual-audio fixation prediction that leverages class activation mapping and multi-granularity perception, reducing the need for large-scale fixation datasets while achieving competitive performance.

Contribution

It proposes a novel weakly supervised approach using only video category tags and class activation mapping to predict visual-audio fixations, with an upgraded multi-granularity perception mechanism.

Findings

01

The method achieves performance comparable to fully supervised models.

02

It reduces the reliance on large-scale fixation datasets.

03

The approach is applicable even without video tags, broadening its usability.

Abstract

Thanks to the rapid advances in deep learning techniques and the wide availability of large-scale training sets, the performance of video saliency detection models has been improving steadily and significantly. However, deep learning-based visualaudio fixation prediction is still in its infancy. At present, only a few visual-audio sequences have been furnished, with real fixations being recorded in real visual-audio environments. Hence, it would be neither efficient nor necessary to recollect real fixations under the same visual-audio circumstances. To address this problem, this paper promotes a novel approach in a weakly supervised manner to alleviate the demand of large-scale training sets for visual-audio model training. By using only the video category tags, we propose the selective class activation mapping (SCAM) and its upgrade (SCAM+). In the spatial-temporal-audio circumstance,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

guotaowang/STANet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Image Enhancement Techniques · Image and Video Quality Assessment