LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech   Recognition Challenge

Yan Jia; Mi Hong; Jingyu Hou; Kailong Ren; Sifan Ma; Jin Wang,; Fangzhen Peng; Yinglin Ji; Lin Yang; Junjie Wang

arXiv:2210.07749·eess.AS·October 18, 2022

LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge

Yan Jia, Mi Hong, Jingyu Hou, Kailong Ren, Sifan Ma, Jin Wang,, Fangzhen Peng, Yinglin Ji, Lin Yang, Junjie Wang

PDF

Open Access

TL;DR

This paper presents LeVoice ASR systems developed for the ISCSLP 2022 challenge, utilizing deep learning, data augmentation, and model fusion to achieve competitive speech recognition performance in an unconstrained model size setting.

Contribution

The paper introduces a combination of deep learning techniques, data augmentation, and model fusion strategies for robust speech recognition in an open model size challenge.

Findings

01

Achieved 10.2% character error rate on test data

02

Fused hybrid and end-to-end architectures for improved performance

03

Ranked third among submitted systems in the challenge

Abstract

This paper describes LeVoice automatic speech recognition systems to track2 of intelligent cockpit speech recognition challenge 2022. Track2 is a speech recognition task without limits on the scope of model size. Our main points include deep learning based speech enhancement, text-to-speech based speech generation, training data augmentation via various techniques and speech recognition model fusion. We compared and fused the hybrid architecture and two kinds of end-to-end architecture. For end-to-end modeling, we used models based on connectionist temporal classification/attention-based encoder-decoder architecture and recurrent neural network transducer/attention-based encoder-decoder architecture. The performance of these models is evaluated with an additional language model to improve word error rates. As a result, our system achieved 10.2\% character error rate on the challenge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems

MethodsTest