Benchmarking Representations for Speech, Music, and Acoustic Events
Moreno La Quatra, Alkis Koudounas, Lorenzo Vaiani, Elena Baralis, Luca, Cagliero, Paolo Garza, Sabato Marco Siniscalchi

TL;DR
This paper introduces ARCH, a comprehensive benchmark with 12 datasets for evaluating audio representation learning methods across speech, music, and acoustic events, facilitating systematic comparison and progress in the field.
Contribution
The paper presents ARCH, a unified benchmark for diverse audio domains, and releases new pre-trained models for non-speech audio, enabling better evaluation and development of ARL methods.
Findings
ARCH enables thorough assessment of SSL models across domains
New pre-trained models show strong performance on non-speech datasets
Benchmarking reveals strengths and gaps in current ARL methods
Abstract
Limited diversity in standardized benchmarks for evaluating audio representation learning (ARL) methods may hinder systematic comparison of current methods' capabilities. We present ARCH, a comprehensive benchmark for evaluating ARL methods on diverse audio classification domains, covering acoustic events, music, and speech. ARCH comprises 12 datasets, that allow us to thoroughly assess pre-trained SSL models of different sizes. ARCH streamlines benchmarking of ARL techniques through its unified access to a wide range of domains and its ability to readily incorporate new datasets and models. To address the current lack of open-source, pre-trained models for non-speech audio, we also release new pre-trained models that demonstrate strong performance on non-speech datasets. We argue that the presented wide-ranging evaluation provides valuable insights into state-of-the-art ARL methods,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies
MethodsAnimatable Reconstruction of Clothed Humans
