From Environmental Sound Representation to Robustness of 2D CNN Models Against Adversarial Attacks
Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras, Koerich

TL;DR
This study examines how different environmental sound spectrogram representations affect the recognition accuracy and adversarial robustness of ResNet-18, revealing a trade-off where higher accuracy often reduces robustness, and DWT spectrograms offer better defense against attacks.
Contribution
It provides a comparative analysis of various spectrogram representations on CNN robustness against adversarial attacks in environmental sound classification.
Findings
DWT spectrograms lead to higher recognition accuracy.
Models trained on DWT are more costly for adversaries to attack.
An inverse relationship exists between accuracy and robustness.
Abstract
This paper investigates the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network, namely ResNet-18. Our main motivation for focusing on such a front-end classifier rather than other complex architectures is balancing recognition accuracy and the total number of training parameters. Herein, we measure the impact of different settings required for generating more informative Mel-frequency cepstral coefficient (MFCC), short-time Fourier transform (STFT), and discrete wavelet transform (DWT) representations on our front-end model. This measurement involves comparing the classification performance over the adversarial robustness. We demonstrate an inverse relationship between recognition accuracy and model robustness against six benchmarking attack…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning · Digital Media Forensic Detection
Methods1x1 Convolution · Dense Connections · Max Pooling · Convolution · Softmax · Average Pooling · Local Response Normalization · Inception Module · Auxiliary Classifier · Dropout
