On Adversarial Robustness of Large-scale Audio Visual Learning

Juncheng B Li; Shuhui Qu; Xinjian Li; Po-Yao Huang; Florian Metze

arXiv:2203.12122·cs.SD·April 22, 2022·1 cites

On Adversarial Robustness of Large-scale Audio Visual Learning

Juncheng B Li, Shuhui Qu, Xinjian Li, Po-Yao Huang, Florian Metze

PDF

Open Access 1 Repo

TL;DR

This paper investigates the adversarial robustness of large-scale multi-modal audio-visual models, proposing new metrics and a mix-up training strategy to evaluate and enhance their robustness against adversarial attacks.

Contribution

It introduces density-based and convexity metrics for measuring multi-modal robustness and proposes a mix-up training method as a computationally efficient robustness enhancement.

Findings

01

Multi-modal models are not inherently more robust than uni-modal models.

02

The proposed metrics effectively evaluate the distribution of modalities in high-dimensional space.

03

Mix-up training can match traditional adversarial training in improving robustness.

Abstract

As audio-visual systems are being deployed for safety-critical tasks such as surveillance and malicious content filtering, their robustness remains an under-studied area. Existing published work on robustness either does not scale to large-scale dataset, or does not deal with multiple modalities. This work aims to study several key questions related to multi-modal learning through the lens of robustness: 1) Are multi-modal models necessarily more robust than uni-modal models? 2) How to efficiently measure the robustness of multi-modal learning? 3) How to fuse different modalities to achieve a more robust multi-modal model? To understand the robustness of the multi-modal model in a large-scale setting, we propose a density-based metric, and a convexity metric to efficiently measure the distribution of each modality in high-dimensional latent space. Our work provides a theoretical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lijuncheng16/AudioSetDoneRight
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Digital Media Forensic Detection · Anomaly Detection Techniques and Applications