TL;DR
This paper introduces a fully interpretable, parameterized neural network layer that learns spectro-temporal modulations for auditory tasks, matching or surpassing existing models and aligning with biological auditory features.
Contribution
The authors propose a novel, interpretable neural network layer based on Gabor kernels that effectively models spectro-temporal features across diverse auditory tasks.
Findings
Learnable STRFs perform on par with state-of-the-art models.
Filters focus on low temporal and spectral modulations.
Learned features resemble those in human auditory cortex.
Abstract
Deep Learning models have become potential candidates for auditory neuroscience research, thanks to their recent successes on a variety of auditory tasks. Yet, these models often lack interpretability to fully understand the exact computations that have been performed. Here, we proposed a parametrized neural network layer, that computes specific spectro-temporal modulations based on Gabor kernels (Learnable STRFs) and that is fully interpretable. We evaluated predictive capabilities of this layer on Speech Activity Detection, Speaker Verification, Urban Sound Classification and Zebra Finch Call Type Classification. We found out that models based on Learnable STRFs are on par for all tasks with different toplines, and obtain the best performance for Speech Activity Detection. As this layer is fully interpretable, we used quantitative measures to describe the distribution of the learned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFirst Integer Neighbor Clustering Hierarchy
