Generative Data Augmentation Challenge: Synthesis of Room Acoustics for   Speaker Distance Estimation

Jackie Lin; Georg G\"otz; Hermes Sampedro Llopis; Haukur Hafsteinsson,; Steinar Gu{\dh}j\'onsson; Daniel Gert Nielsen; Finnur Pind; Paris Smaragdis,; Dinesh Manocha; John Hershey; Trausti Kristjansson; Minje Kim

arXiv:2501.13250·eess.AS·January 24, 2025

Generative Data Augmentation Challenge: Synthesis of Room Acoustics for Speaker Distance Estimation

Jackie Lin, Georg G\"otz, Hermes Sampedro Llopis, Haukur Hafsteinsson,, Steinar Gu{\dh}j\'onsson, Daniel Gert Nielsen, Finnur Pind, Paris Smaragdis,, Dinesh Manocha, John Hershey, Trausti Kristjansson, Minje Kim

PDF

Open Access

TL;DR

This paper introduces a challenge for synthesizing room acoustics to augment data for improving speaker distance estimation, focusing on generating diverse room impulse responses to enhance spatial audio tasks.

Contribution

It presents a novel challenge for generating diverse room acoustics data to support spatially sensitive speech processing tasks, addressing the difficulty of precise acoustic measurement or simulation.

Findings

01

Challenge dataset and evaluation code released

02

Generative data augmentation shown as a promising solution

03

Focus on improving speaker distance estimation accuracy

Abstract

This paper describes the synthesis of the room acoustics challenge as a part of the generative data augmentation workshop at ICASSP 2025. The challenge defines a unique generative task that is designed to improve the quantity and diversity of the room impulse responses dataset so that it can be used for spatially sensitive downstream tasks: speaker distance estimation. The challenge identifies the technical difficulty in measuring or simulating many rooms' acoustic characteristics precisely. As a solution, it proposes generative data augmentation as an alternative that can potentially be used to improve various downstream tasks. The challenge website, dataset, and evaluation code are available at https://sites.google.com/view/genda2025.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing