AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario
Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen,, Yanxin Hu, Lei Xie, Jian Wu, Hui Bu, Xin Xu, Jun Du, Jingdong Chen

TL;DR
AISHELL-4 is a comprehensive open-source Mandarin speech dataset recorded in conference scenarios, enabling research on multi-speaker processing, recognition, and diarization with realistic acoustics and detailed annotations.
Contribution
It introduces a large-scale, real-recorded Mandarin conference dataset with multi-task annotations and a baseline framework, filling a gap in non-English multi-speaker speech resources.
Findings
Provides realistic acoustics with natural speech variations.
Includes detailed annotations for speech and speaker activities.
Supports multi-task research in speech processing.
Abstract
In this paper, we present AISHELL-4, a sizable real-recorded Mandarin speech dataset collected by 8-channel circular microphone array for speech processing in conference scenario. The dataset consists of 211 recorded meeting sessions, each containing 4 to 8 speakers, with a total length of 120 hours. This dataset aims to bridge the advanced research on multi-speaker processing and the practical application scenario in three aspects. With real recorded meetings, AISHELL-4 provides realistic acoustics and rich natural speech characteristics in conversation such as short pause, speech overlap, quick speaker turn, noise, etc. Meanwhile, accurate transcription and speaker voice activity are provided for each meeting in AISHELL-4. This allows the researchers to explore different aspects in meeting processing, ranging from individual tasks such as speech front-end processing, speech…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗pyannote/speaker-diarization-community-1model· 2.0M dl· ♡ 2672.0M dl♡ 267
- 🤗pyannote-community/speaker-diarization-community-1model· 589 dl· ♡ 3589 dl♡ 3
- 🤗jaman21/pyannote-speaker-diarization-community-1model· 11 dl11 dl
- 🤗aTrain-core/speaker-detectionmodel· 64 dl64 dl
- 🤗DroolingPanda/speaker-diarization-community-1model· 5 dl5 dl
- 🤗anchor-flux/pyannote-speaker-diarization-community-1model· 1.8k dl· ♡ 11.8k dl♡ 1
- 🤗beargreen/speaker-diarization-community-1model· 27 dl27 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
