LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge
Yan Jia, Mi Hong, Jingyu Hou, Kailong Ren, Sifan Ma, Jin Wang,, Fangzhen Peng, Yinglin Ji, Lin Yang, Junjie Wang

TL;DR
This paper presents LeVoice ASR systems developed for the ISCSLP 2022 challenge, utilizing deep learning, data augmentation, and model fusion to achieve competitive speech recognition performance in an unconstrained model size setting.
Contribution
The paper introduces a combination of deep learning techniques, data augmentation, and model fusion strategies for robust speech recognition in an open model size challenge.
Findings
Achieved 10.2% character error rate on test data
Fused hybrid and end-to-end architectures for improved performance
Ranked third among submitted systems in the challenge
Abstract
This paper describes LeVoice automatic speech recognition systems to track2 of intelligent cockpit speech recognition challenge 2022. Track2 is a speech recognition task without limits on the scope of model size. Our main points include deep learning based speech enhancement, text-to-speech based speech generation, training data augmentation via various techniques and speech recognition model fusion. We compared and fused the hybrid architecture and two kinds of end-to-end architecture. For end-to-end modeling, we used models based on connectionist temporal classification/attention-based encoder-decoder architecture and recurrent neural network transducer/attention-based encoder-decoder architecture. The performance of these models is evaluated with an additional language model to improve word error rates. As a result, our system achieved 10.2\% character error rate on the challenge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
MethodsTest
