Learning spectro-temporal representations of complex sounds with   parameterized neural networks

Rachid Riad; Julien Karadayi; Anne-Catherine Bachoud-L\'evi and; Emmanuel Dupoux

arXiv:2103.07125·cs.SD·August 4, 2021

Learning spectro-temporal representations of complex sounds with parameterized neural networks

Rachid Riad, Julien Karadayi, Anne-Catherine Bachoud-L\'evi and, Emmanuel Dupoux

PDF

1 Repo

TL;DR

This paper introduces a fully interpretable, parameterized neural network layer that learns spectro-temporal modulations for auditory tasks, matching or surpassing existing models and aligning with biological auditory features.

Contribution

The authors propose a novel, interpretable neural network layer based on Gabor kernels that effectively models spectro-temporal features across diverse auditory tasks.

Findings

01

Learnable STRFs perform on par with state-of-the-art models.

02

Filters focus on low temporal and spectral modulations.

03

Learned features resemble those in human auditory cortex.

Abstract

Deep Learning models have become potential candidates for auditory neuroscience research, thanks to their recent successes on a variety of auditory tasks. Yet, these models often lack interpretability to fully understand the exact computations that have been performed. Here, we proposed a parametrized neural network layer, that computes specific spectro-temporal modulations based on Gabor kernels (Learnable STRFs) and that is fully interpretable. We evaluated predictive capabilities of this layer on Speech Activity Detection, Speaker Verification, Urban Sound Classification and Zebra Finch Call Type Classification. We found out that models based on Learnable STRFs are on par for all tasks with different toplines, and obtain the best performance for Speech Activity Detection. As this layer is fully interpretable, we used quantitative measures to describe the distribution of the learned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bootphon/learnable-strf
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFirst Integer Neighbor Clustering Hierarchy