ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio

Andrey Guzhov; Federico Raue; J\"orn Hees; Andreas Dengel

arXiv:2104.11587·cs.SD·April 26, 2021

ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio

Andrey Guzhov, Federico Raue, J\"orn Hees, Andreas Dengel

PDF

1 Repo

TL;DR

This paper introduces a novel time-frequency transformation layer based on complex frequency B-spline wavelets, improving accuracy and robustness in environmental sound classification over traditional methods like STFT.

Contribution

The paper presents a new fbsp-layer for time-frequency transformation that enhances classification accuracy and robustness in audio models, with analysis of pre-training strategies and noise resilience.

Findings

01

Achieved 95.20% accuracy on ESC-50 dataset.

02

Achieved 89.14% accuracy on UrbanSound8K dataset.

03

Demonstrated increased robustness against noise and signal reduction.

Abstract

Environmental Sound Classification (ESC) is a rapidly evolving field that recently demonstrated the advantages of application of visual domain techniques to the audio-related tasks. Previous studies indicate that the domain-specific modification of cross-domain approaches show a promise in pushing the whole area of ESC forward. In this paper, we present a new time-frequency transformation layer that is based on complex frequency B-spline (fbsp) wavelets. Being used with a high-performance audio classification model, the proposed fbsp-layer provides an accuracy improvement over the previously used Short-Time Fourier Transform (STFT) on standard datasets. We also investigate the influence of different pre-training strategies, including the joint use of two large-scale datasets for weight initialization: ImageNet and AudioSet. Our proposed model out-performs other approaches by achieving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AndreyGuzhov/ESResNeXt-fbsp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.