Can audio-visual integration strengthen robustness under multimodal attacks?
Yapeng Tian, Chenliang Xu

TL;DR
This paper systematically investigates how audio-visual integration affects the robustness of multisensory perception under adversarial attacks, revealing that integration can sometimes decrease robustness and proposing defenses to improve resilience.
Contribution
It provides the first comprehensive analysis of multimodal adversarial attacks on audio-visual models and introduces a novel defense strategy to enhance robustness.
Findings
Audio-visual models are vulnerable to multimodal adversarial attacks.
Audio-visual integration can reduce robustness under attacks.
Proposed defense improves model resilience without much performance loss.
Abstract
In this paper, we propose to make a systematic study on machines multisensory perception under attacks. We use the audio-visual event recognition task against multimodal adversarial attacks as a proxy to investigate the robustness of audio-visual learning. We attack audio, visual, and both modalities to explore whether audio-visual integration still strengthens perception and how different fusion mechanisms affect the robustness of audio-visual models. For interpreting the multimodal interactions under attacks, we learn a weakly-supervised sound source visual localization model to localize sounding regions in videos. To mitigate multimodal attacks, we propose an audio-visual defense approach based on an audio-visual dissimilarity constraint and external feature memory banks. Extensive experiments demonstrate that audio-visual models are susceptible to multimodal adversarial attacks;…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Adversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
