Instabilities in Convnets for Raw Audio

Daniel Haider; Vincent Lostanlen; Martin Ehler; Peter Balazs

arXiv:2309.05855·cs.LG·April 29, 2024

Instabilities in Convnets for Raw Audio

Daniel Haider, Vincent Lostanlen, Martin Ehler, Peter Balazs

PDF

Open Access 1 Repo

TL;DR

This paper investigates why training convolutional neural networks for raw audio is challenging, focusing on the impact of initialization and filter size on stability and approximation quality.

Contribution

It introduces a theoretical framework analyzing large deviations in filterbank responses, highlighting issues with large filters and periodic signals in audio convnets.

Findings

01

Deviations increase with larger filters and periodic inputs.

02

Numerical simulations confirm the theory.

03

Condition number scales logarithmically with filter size.

Abstract

What makes waveform-based deep learning so hard? Despite numerous attempts at training convolutional neural networks (convnets) for filterbank design, they often fail to outperform hand-crafted baselines. These baselines are linear time-invariant systems: as such, they can be approximated by convnets with wide receptive fields. Yet, in practice, gradient-based optimization leads to suboptimal approximations. In our article, we approach this phenomenon from the perspective of initialization. We present a theory of large deviations for the energy response of FIR filterbanks with random Gaussian weights. We find that deviations worsen for large filters and locally periodic input signals, which are both typical for audio signal processing applications. Numerical simulations align with our theory and suggest that the condition number of a convolutional layer follows a logarithmic scaling law…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

danedane-haider/random-filterbanks
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAcoustic Wave Phenomena Research · Image and Signal Denoising Methods · Underwater Acoustics Research

MethodsALIGN · fail