ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech   Recognition Challenge

He Wang; Pengcheng Guo; Yue Li; Ao Zhang; Jiayao Sun; Lei Xie; Wei; Chen; Pan Zhou; Hui Bu; Xin Xu; Binbin Zhang; Zhuo Chen; Jian Wu; Longbiao; Wang; Eng Siong Chng; Sun Li

arXiv:2401.03473·cs.SD·February 22, 2024·1 cites

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge

He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei, Chen, Pan Zhou, Hui Bu, Xin Xu, Binbin Zhang, Zhuo Chen, Jian Wu, Longbiao, Wang, Eng Siong Chng, Sun Li

PDF

Open Access

TL;DR

The ICMC-ASR challenge advances in-car multi-channel speech recognition by providing new datasets, establishing evaluation metrics, and fostering international research with nearly 100 participating teams.

Contribution

This paper introduces the first in-car multi-channel speech recognition challenge with new datasets, evaluation metrics, and a competitive framework to promote research in driving scenarios.

Findings

01

Top team achieved 13.16% CER in ASR

02

Significant improvements over baseline in both tracks

03

High participation indicates strong community interest

Abstract

To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge. This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle and 40 hours of noise for data augmentation. Two tracks, including automatic speech recognition (ASR) and automatic speech diarization and recognition (ASDR) are set up, using character error rate (CER) and concatenated minimum permutation character error rate (cpCER) as evaluation metrics, respectively. Overall, the ICMC-ASR Challenge attracts 98 participating teams and receives 53 valid results in both tracks. In the end, first-place team USTCiflytek achieves a CER of 13.16% in the ASR track and a cpCER…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems

MethodsSparse Evolutionary Training