Benchmarking Representations for Speech, Music, and Acoustic Events

Moreno La Quatra; Alkis Koudounas; Lorenzo Vaiani; Elena Baralis; Luca; Cagliero; Paolo Garza; Sabato Marco Siniscalchi

arXiv:2405.00934·eess.AS·September 17, 2024

Benchmarking Representations for Speech, Music, and Acoustic Events

Moreno La Quatra, Alkis Koudounas, Lorenzo Vaiani, Elena Baralis, Luca, Cagliero, Paolo Garza, Sabato Marco Siniscalchi

PDF

Open Access 1 Repo 4 Models

TL;DR

This paper introduces ARCH, a comprehensive benchmark with 12 datasets for evaluating audio representation learning methods across speech, music, and acoustic events, facilitating systematic comparison and progress in the field.

Contribution

The paper presents ARCH, a unified benchmark for diverse audio domains, and releases new pre-trained models for non-speech audio, enabling better evaluation and development of ARL methods.

Findings

01

ARCH enables thorough assessment of SSL models across domains

02

New pre-trained models show strong performance on non-speech datasets

03

Benchmarking reveals strengths and gaps in current ARL methods

Abstract

Limited diversity in standardized benchmarks for evaluating audio representation learning (ARL) methods may hinder systematic comparison of current methods' capabilities. We present ARCH, a comprehensive benchmark for evaluating ARL methods on diverse audio classification domains, covering acoustic events, music, and speech. ARCH comprises 12 datasets, that allow us to thoroughly assess pre-trained SSL models of different sizes. ARCH streamlines benchmarking of ARL techniques through its unified access to a wide range of domains and its ability to readily incorporate new datasets and models. To address the current lack of open-source, pre-trained models for non-speech audio, we also release new pre-trained models that demonstrate strong performance on non-speech datasets. We argue that the presented wide-ranging evaluation provides valuable insights into state-of-the-art ARL methods,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MorenoLaQuatra/ARCH
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies

MethodsAnimatable Reconstruction of Clothed Humans