# A Robust Approach for Securing Audio Classification Against Adversarial   Attacks

**Authors:** Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich

arXiv: 1904.10990 · 2019-11-26

## TL;DR

This paper introduces a novel SVM-based method using DWT and SURF features to improve the robustness of audio classification systems against adversarial attacks, balancing accuracy and security.

## Contribution

The paper proposes a new approach combining DWT preprocessing, SURF feature extraction, and SVM classification to enhance adversarial robustness in audio systems.

## Key findings

- The proposed method achieves high accuracy on environmental sound datasets.
- It demonstrates superior robustness against state-of-the-art adversarial attacks.
- The approach offers a good trade-off between accuracy and security.

## Abstract

Adversarial audio attacks can be considered as a small perturbation unperceptive to human ears that is intentionally added to the audio signal and causes a machine learning model to make mistakes. This poses a security concern about the safety of machine learning models since the adversarial attacks can fool such models toward the wrong predictions. In this paper we first review some strong adversarial attacks that may affect both audio signals and their 2D representations and evaluate the resiliency of the most common machine learning model, namely deep learning models and support vector machines (SVM) trained on 2D audio representations such as short time Fourier transform (STFT), discrete wavelet transform (DWT) and cross recurrent plot (CRP) against several state-of-the-art adversarial attacks. Next, we propose a novel approach based on pre-processed DWT representation of audio signals and SVM to secure audio systems against adversarial attacks. The proposed architecture has several preprocessing modules for generating and enhancing spectrograms including dimension reduction and smoothing. We extract features from small patches of the spectrograms using speeded up robust feature (SURF) algorithm which are further used to generate a codebook using the K-Means++ algorithm. Finally, codewords are used to train a SVM on the codebook of the SURF-generated vectors. All these steps yield to a novel approach for audio classification that provides a good trade-off between accuracy and resilience. Experimental results on three environmental sound datasets show the competitive performance of proposed approach compared to the deep neural networks both in terms of accuracy and robustness against strong adversarial attacks.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.10990/full.md

## Figures

16 figures with captions in the complete paper: https://tomesphere.com/paper/1904.10990/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/1904.10990/full.md

---
Source: https://tomesphere.com/paper/1904.10990