Dual Domain-Adversarial Learning for Audio-Visual Saliency Prediction

Yingzi Fan; Longfei Han; Yue Zhang; Lechao Cheng; Chen Xia; Di Hu

arXiv:2208.05220·cs.CV·August 17, 2022

Dual Domain-Adversarial Learning for Audio-Visual Saliency Prediction

Yingzi Fan, Longfei Han, Yue Zhang, Lechao Cheng, Chen Xia, Di Hu

PDF

Open Access

TL;DR

This paper introduces a dual domain-adversarial learning approach to improve audio-visual saliency prediction across different domains by aligning auditory and visual features, reducing performance loss due to domain shifts.

Contribution

It proposes a novel dual domain-adversarial framework that aligns both auditory and visual features for unsupervised domain adaptation in saliency prediction.

Findings

01

Reduces domain discrepancy effects in audio-visual saliency prediction

02

Improves performance on public benchmarks

03

Demonstrates effectiveness of dual adversarial alignment

Abstract

Both visual and auditory information are valuable to determine the salient regions in videos. Deep convolution neural networks (CNN) showcase strong capacity in coping with the audio-visual saliency prediction task. Due to various factors such as shooting scenes and weather, there often exists moderate distribution discrepancy between source training data and target testing data. The domain discrepancy induces to performance degradation on target testing data for CNN models. This paper makes an early attempt to tackle the unsupervised domain adaptation problem for audio-visual saliency prediction. We propose a dual domain-adversarial learning algorithm to mitigate the domain discrepancy between source and target data. First, a specific domain discrimination branch is built up for aligning the auditory feature distributions. Then, those auditory features are fused into the visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Image and Video Quality Assessment · Aesthetic Perception and Analysis

MethodsConvolution