Evaluation of Deep Audio Representations for Hearables

Fabian Gr\"oger; Pascal Baumann; Ludovic Amruthalingam; Laurent Simon,; Ruksana Giurda; Simone Lionetti

arXiv:2502.06664·cs.SD·February 25, 2025

Evaluation of Deep Audio Representations for Hearables

Fabian Gr\"oger, Pascal Baumann, Ludovic Amruthalingam, Laurent Simon,, Ruksana Giurda, Simone Lionetti

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces DEAR, a new dataset and benchmark for evaluating foundation models' ability to capture acoustic properties relevant to hearable devices, demonstrating the superiority of the BEATs model in this context.

Contribution

The paper presents DEAR, the first dataset and benchmark specifically designed to evaluate foundation models for acoustic scene understanding in hearables, and shows BEATs' leading performance.

Findings

01

BEATs significantly outperforms other models on the benchmark

02

Diverse training data enhances model applicability to auditory tasks

03

DEAR enables systematic evaluation of audio representations for hearables

Abstract

Effectively steering hearable devices requires understanding the acoustic environment around the user. In the computational analysis of sound scenes, foundation models have emerged as the state of the art to produce high-performance, robust, multi-purpose audio representations. We introduce and release Deep Evaluation of Audio Representations (DEAR), the first dataset and benchmark to evaluate the efficacy of foundation models in capturing essential acoustic properties for hearables. The dataset includes 1,158 audio tracks, each 30 seconds long, created by spatially mixing proprietary monologues with commercial, high-quality recordings of everyday acoustic scenes. Our benchmark encompasses eight tasks that assess the general context, speech sources, and technical acoustic properties of the audio scenes. Through our evaluation of four general-purpose audio representation models, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DEAR-dataset/code
pytorch

Datasets

HSLU-AAI/DEAR
dataset· 16 dl
16 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing