Dual Domain-Adversarial Learning for Audio-Visual Saliency Prediction
Yingzi Fan, Longfei Han, Yue Zhang, Lechao Cheng, Chen Xia, Di Hu

TL;DR
This paper introduces a dual domain-adversarial learning approach to improve audio-visual saliency prediction across different domains by aligning auditory and visual features, reducing performance loss due to domain shifts.
Contribution
It proposes a novel dual domain-adversarial framework that aligns both auditory and visual features for unsupervised domain adaptation in saliency prediction.
Findings
Reduces domain discrepancy effects in audio-visual saliency prediction
Improves performance on public benchmarks
Demonstrates effectiveness of dual adversarial alignment
Abstract
Both visual and auditory information are valuable to determine the salient regions in videos. Deep convolution neural networks (CNN) showcase strong capacity in coping with the audio-visual saliency prediction task. Due to various factors such as shooting scenes and weather, there often exists moderate distribution discrepancy between source training data and target testing data. The domain discrepancy induces to performance degradation on target testing data for CNN models. This paper makes an early attempt to tackle the unsupervised domain adaptation problem for audio-visual saliency prediction. We propose a dual domain-adversarial learning algorithm to mitigate the domain discrepancy between source and target data. First, a specific domain discrimination branch is built up for aligning the auditory feature distributions. Then, those auditory features are fused into the visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Image and Video Quality Assessment · Aesthetic Perception and Analysis
MethodsConvolution
