Audio-Visual Event Recognition through the lens of Adversary

Juncheng B Li; Kaixin Ma; Shuhui Qu; Po-Yao Huang; Florian Metze

arXiv:2011.07430·cs.CV·November 17, 2020

Audio-Visual Event Recognition through the lens of Adversary

Juncheng B Li, Kaixin Ma, Shuhui Qu, Po-Yao Huang, Florian Metze

PDF

1 Repo

TL;DR

This paper investigates the robustness of audio-visual classification models against adversarial noises, analyzing fusion strategies, feature contributions, and neural module vulnerabilities to improve understanding and robustness.

Contribution

It introduces a comprehensive study of adversarial attacks on multimodal audio-visual models, revealing insights into fusion strategies and feature robustness.

Findings

01

Early/middle/late fusion impacts robustness and accuracy

02

Different frequency/time features contribute variably to robustness

03

Neural modules exhibit distinct vulnerabilities to adversarial noise

Abstract

As audio/visual classification models are widely deployed for sensitive tasks like content filtering at scale, it is critical to understand their robustness along with improving the accuracy. This work aims to study several key questions related to multimodal learning through the lens of adversarial noises: 1) The trade-off between early/middle/late fusion affecting its robustness and accuracy 2) How do different frequency/time domain features contribute to the robustness? 3) How do different neural modules contribute to the adversarial noise? In our experiment, we construct adversarial examples to attack state-of-the-art neural models trained on Google AudioSet. We compare how much attack potency in terms of adversarial perturbation of size $ϵ$ using different $L_{p}$ norms we would need to "deactivate" the victim model. Using adversarial noise to ablate multimodal models, we are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lijuncheng16/AudioSetDoneRight
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.