The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset,   task and baselines

Jon Barker; Shinji Watanabe (CLSP); Emmanuel Vincent (MULTISPEECH),; Jan Trmal (CLSP)

arXiv:1803.10609·cs.SD·March 29, 2018·5 cites

The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines

Jon Barker, Shinji Watanabe (CLSP), Emmanuel Vincent (MULTISPEECH),, Jan Trmal (CLSP)

PDF

Open Access

TL;DR

The 5th CHiME Challenge advances robust speech recognition by providing a new dataset and benchmarks for distant multi-microphone ASR in realistic home environments, promoting research in signal processing and machine learning.

Contribution

This paper introduces the 5th CHiME Challenge dataset, task, and baselines, focusing on distant multi-microphone conversational ASR in natural home settings, with detailed data collection and evaluation procedures.

Findings

01

New dataset with real home environment recordings

02

Baseline systems for array synchronization and speech enhancement

03

Performance benchmarks for robustness in distant-microphone ASR

Abstract

The CHiME challenge series aims to advance robust automatic speech recognition (ASR) technology by promoting research at the interface of speech and language processing, signal processing , and machine learning. This paper introduces the 5th CHiME Challenge, which considers the task of distant multi-microphone conversational ASR in real home environments. Speech material was elicited using a dinner party scenario with efforts taken to capture data that is representative of natural conversational speech and recorded by 6 Kinect microphone arrays and 4 binaural microphone pairs. The challenge features a single-array track and a multiple-array track and, for each track, distinct rankings will be produced for systems focusing on robustness with respect to distant-microphone capture vs. systems attempting to address all aspects of the task including conversational language modeling. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques