Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection
Xuanjun Chen, Haibin Wu, Helen Meng, Hung-yi Lee, Jyh-Shing Roger Jang

TL;DR
This paper investigates the vulnerability of audio-visual active speaker detection models to adversarial attacks and introduces a novel loss function to improve their robustness, demonstrating significant performance gains over existing defenses.
Contribution
It is the first to analyze adversarial robustness in AVASD and proposes AVIL, a new loss function that enhances model resilience against multi-modal attacks.
Findings
AVASD models are vulnerable to adversarial attacks across modalities.
The proposed AVIL improves robustness, outperforming adversarial training by 33.14 mAP%.
Experimental results validate the effectiveness of AVIL in defending against multi-modal adversarial attacks.
Abstract
Audio-visual active speaker detection (AVASD) is well-developed, and now is an indispensable front-end for several multi-modal applications. However, to the best of our knowledge, the adversarial robustness of AVASD models hasn't been investigated, not to mention the effective defense against such attacks. In this paper, we are the first to reveal the vulnerability of AVASD models under audio-only, visual-only, and audio-visual adversarial attacks through extensive experiments. What's more, we also propose a novel audio-visual interaction loss (AVIL) for making attackers difficult to find feasible adversarial examples under an allocated attack budget. The loss aims at pushing the inter-class embeddings to be dispersed, namely non-speech and speech clusters, sufficiently disentangled, and pulling the intra-class embeddings as close as possible to keep them compact. Experimental results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Anomaly Detection Techniques and Applications
