A Model You Can Hear: Audio Identification with Playable Prototypes

Romain Loiseau; Baptiste Bouvier; Yann Teytaut; Elliot Vincent,; Mathieu Aubry; Loic Landrieu

arXiv:2208.03311·cs.SD·August 8, 2022·1 cites

A Model You Can Hear: Audio Identification with Playable Prototypes

Romain Loiseau, Baptiste Bouvier, Yann Teytaut, Elliot Vincent,, Mathieu Aubry, Loic Landrieu

PDF

Open Access 1 Repo

TL;DR

This paper introduces an interpretable audio identification model using learnable spectral prototypes and transformation networks, achieving state-of-the-art results in speaker and instrument classification.

Contribution

It presents a novel, interpretable approach for audio classification based on spectral prototypes and transformation-invariant learning, improving accuracy and interpretability.

Findings

01

Achieves state-of-the-art speaker and instrument identification accuracy.

02

Provides a model that is both interpretable and adaptable to supervised or unsupervised training.

03

Demonstrates effective clustering and classification of large audio collections.

Abstract

Machine learning techniques have proved useful for classifying and analyzing audio content. However, recent methods typically rely on abstract and high-dimensional representations that are difficult to interpret. Inspired by transformation-invariant approaches developed for image and 3D data, we propose an audio identification model based on learnable spectral prototypes. Equipped with dedicated transformation networks, these prototypes can be used to cluster and classify input audio samples from large collections of sounds. Our model can be trained with or without supervision and reaches state-of-the-art results for speaker and instrument identification, while remaining easily interpretable. The code is available at: https://github.com/romainloiseau/a-model-you-can-hear

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

romainloiseau/a-model-you-can-hear
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Diverse Musicological Studies