RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for   Dynamic Speech Enhancement and Localization

Bing Yang; Changsheng Quan; Yabo Wang; Pengyu Wang; Yujie Yang; Ying; Fang; Nian Shao; Hui Bu; Xin Xu; Xiaofei Li

arXiv:2406.19959·cs.SD·October 2, 2024

RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization

Bing Yang, Changsheng Quan, Yabo Wang, Pengyu Wang, Yujie Yang, Ying, Fang, Nian Shao, Hui Bu, Xin Xu, Xiaofei Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces RealMAN, a large-scale real-recorded dataset with microphone array recordings for improving speech enhancement and localization in diverse real-world environments, addressing the gap between simulated and real data.

Contribution

The creation of a comprehensive real-world microphone array dataset with extensive annotations for training and benchmarking speech enhancement and localization systems.

Findings

01

Provides 83.7 hours of speech data in various environments.

02

Includes 144.5 hours of background noise recordings.

03

Enables improved real-world speech processing models.

Abstract

The training of deep learning-based multichannel speech enhancement and source localization systems relies heavily on the simulation of room impulse response and multichannel diffuse noise, due to the lack of large-scale real-recorded datasets. However, the acoustic mismatch between simulated and real-world data could degrade the model performance when applying in real-world scenarios. To bridge this simulation-to-real gap, this paper presents a new relatively large-scale Real-recorded and annotated Microphone Array speech&Noise (RealMAN) dataset. The proposed dataset is valuable in two aspects: 1) benchmarking speech enhancement and localization algorithms in real scenarios; 2) offering a substantial amount of real-world training data for potentially improving the performance of real-world applications. Specifically, a 32-channel array with high-fidelity microphones is used for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Audio-WestlakeU/RealMAN
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development