Fitting Auditory Filterbanks with Multiresolution Neural Networks

Vincent Lostanlen; Daniel Haider; Han Han; Mathieu Lagrange; Peter; Balazs; Martin Ehler

arXiv:2307.13821·cs.SD·July 9, 2024

Fitting Auditory Filterbanks with Multiresolution Neural Networks

Vincent Lostanlen, Daniel Haider, Han Han, Mathieu Lagrange, Peter, Balazs, Martin Ehler

PDF

Open Access 2 Repos

TL;DR

This paper introduces MuReNN, a multiresolution neural network that combines wavelet transforms with convolutional learning to accurately model auditory filterbanks, improving time-frequency localization and fitting real-world data.

Contribution

MuReNN integrates wavelet-based multiresolution analysis with neural networks, overcoming limitations of purely parametric or nonparametric models in auditory filterbank approximation.

Findings

01

MuReNN achieves state-of-the-art fit to Gammatone, CQT, and third-octave filterbanks.

02

It improves time-frequency localization compared to traditional convnets.

03

MuReNN effectively combines domain knowledge with data-driven learning.

Abstract

Waveform-based deep learning faces a dilemma between nonparametric and parametric approaches. On one hand, convolutional neural networks (convnets) may approximate any linear time-invariant system; yet, in practice, their frequency responses become more irregular as their receptive fields grow. On the other hand, a parametric model such as LEAF is guaranteed to yield Gabor filters, hence an optimal time-frequency localization; yet, this strong inductive bias comes at the detriment of representational capacity. In this paper, we aim to overcome this dilemma by introducing a neural audio model, named multiresolution neural network (MuReNN). The key idea behind MuReNN is to train separate convolutional operators over the octave subbands of a discrete wavelet transform (DWT). Since the scale of DWT atoms grows exponentially between octaves, the receptive fields of the subsequent learnable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Image and Signal Denoising Methods

MethodsKnowledge Distillation