End-to-End Audio Strikes Back: Boosting Augmentations Towards An   Efficient Audio Classification Network

Avi Gazneli; Gadi Zimerman; Tal Ridnik; Gilad Sharir; Asaf Noy

arXiv:2204.11479·cs.SD·July 6, 2022·28 cites

End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network

Avi Gazneli, Gadi Zimerman, Tal Ridnik, Gilad Sharir, Asaf Noy

PDF

Open Access 1 Repo

TL;DR

This paper introduces an efficient end-to-end audio classification network that leverages novel augmentations and lightweight architecture, achieving state-of-the-art results across multiple sound classification datasets.

Contribution

It presents a new end-to-end audio classification model utilizing novel augmentations and lightweight design, reducing reliance on multiple representations and large architectures.

Findings

01

Achieved state-of-the-art results on various sound classification datasets.

02

Demonstrated robustness and generalization of the proposed approach.

03

Validated effectiveness through extensive experiments.

Abstract

While efficient architectures and a plethora of augmentations for end-to-end image classification tasks have been suggested and heavily investigated, state-of-the-art techniques for audio classifications still rely on numerous representations of the audio signal together with large architectures, fine-tuned from large datasets. By utilizing the inherited lightweight nature of audio and novel audio augmentations, we were able to present an efficient end-to-end network with strong generalization ability. Experiments on a variety of sound classification sets demonstrate the effectiveness and robustness of our approach, by achieving state-of-the-art results in various settings. Public code is available at: \href{https://github.com/Alibaba-MIIL/AudioClassfication}{this http url}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Alibaba-MIIL/AudioClassfication
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis