Cross-Representation Transferability of Adversarial Attacks: From   Spectrograms to Audio Waveforms

Karl Michel Koerich; Mohammad Esmaeilpour; Sajjad Abdoli; Alceu de; Souza Britto Jr.; Alessandro Lameiras Koerich

arXiv:1910.10106·cs.SD·July 30, 2020

Cross-Representation Transferability of Adversarial Attacks: From Spectrograms to Audio Waveforms

Karl Michel Koerich, Mohammad Esmaeilpour, Sajjad Abdoli, Alceu de, Souza Britto Jr., Alessandro Lameiras Koerich

PDF

1 Repo

TL;DR

This study demonstrates that adversarial attacks on spectrograms can transfer to audio waveforms, significantly reducing classifier accuracy and exposing vulnerabilities in spectrogram-based audio classification systems.

Contribution

It introduces the transferability of adversarial attacks from spectrogram representations to reconstructed audio waveforms, highlighting new security challenges in audio classification.

Findings

01

Spectrogram-based attacks fool 2D CNNs with up to 81.87% accuracy dropping to 12.09%.

02

Reconstructed audio from perturbed spectrograms fools 1D CNNs with accuracy dropping from 78.29% to 27.91%.

03

Adversarial perturbations are visually imperceptible yet highly effective.

Abstract

This paper shows the susceptibility of spectrogram-based audio classifiers to adversarial attacks and the transferability of such attacks to audio waveforms. Some commonly used adversarial attacks to images have been applied to Mel-frequency and short-time Fourier transform spectrograms, and such perturbed spectrograms are able to fool a 2D convolutional neural network (CNN). Such attacks produce perturbed spectrograms that are visually imperceptible by humans. Furthermore, the audio waveforms reconstructed from the perturbed spectrograms are also able to fool a 1D CNN trained on the original audio. Experimental results on a dataset of western music have shown that the 2D CNN achieves up to 81.87% of mean accuracy on legitimate examples and such performance drops to 12.09% on adversarial examples. Likewise, the 1D CNN achieves up to 78.29% of mean accuracy on original audio samples and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

karlmiko/ijcnn2020
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.