LibriWASN: A Data Set for Meeting Separation, Diarization, and Recognition with Asynchronous Recording Devices
Joerg Schmalenstroeer, Tobias Gburrek, Reinhold Haeb-Umbach

TL;DR
LibriWASN is a new dataset designed for testing meeting separation, diarization, and recognition using asynchronous, multi-device recordings that mimic real-world wireless sensor networks.
Contribution
It introduces a challenging, realistic dataset with asynchronous recordings from multiple devices for advancing meeting analysis systems.
Findings
Dataset includes 29 channels from 9 devices with unsynchronized clocks.
Contains ground-truth diarization for speaker timing.
Designed for testing clock synchronization and separation algorithms.
Abstract
We present LibriWASN, a data set whose design follows closely the LibriCSS meeting recognition data set, with the marked difference that the data is recorded with devices that are randomly positioned on a meeting table and whose sampling clocks are not synchronized. Nine different devices, five smartphones with a single recording channel and four microphone arrays, are used to record a total of 29 channels. Other than that, the data set follows closely the LibriCSS design: the same LibriSpeech sentences are played back from eight loudspeakers arranged around a meeting table and the data is organized in subsets with different percentages of speech overlap. LibriWASN is meant as a test set for clock synchronization algorithms, meeting separation, diarization and transcription systems on ad-hoc wireless acoustic sensor networks. Due to its similarity to LibriCSS, meeting transcription…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndoor and Outdoor Localization Technologies · Speech and Audio Processing · Bluetooth and Wireless Communication Technologies
