Transsion TSUP's speech recognition system for ASRU 2023 MADASR Challenge
Xiaoxiao Li, Gaosheng Zhang, An Zhu, Weiyong Li, Shuming Fang, Xiaoyue, Yang, Jianchao Zhu

TL;DR
This paper describes Transsion TSUP's speech recognition system for the ASRU 2023 MADASR Challenge, focusing on low-resource Indian languages using advanced models and external language models to improve accuracy.
Contribution
The paper introduces a hybrid ASR system combining squeezeformer and transformer models with external language models, and fine-tunes pretrained IndicWhisper models for low-resource Indian languages.
Findings
Achieved WER of around 15-24% for Bengali and Bhojpuri across all tracks.
Demonstrated improved recognition accuracy with external KenLM language models.
Validated effectiveness of combining traditional and pretrained models for low-resource languages.
Abstract
This paper presents a speech recognition system developed by the Transsion Speech Understanding Processing Team (TSUP) for the ASRU 2023 MADASR Challenge. The system focuses on adapting ASR models for low-resource Indian languages and covers all four tracks of the challenge. For tracks 1 and 2, the acoustic model utilized a squeezeformer encoder and bidirectional transformer decoder with joint CTC-Attention training loss. Additionally, an external KenLM language model was used during TLG beam search decoding. For tracks 3 and 4, pretrained IndicWhisper models were employed and finetuned on both the challenge dataset and publicly available datasets. The whisper beam search decoding was also modified to support an external KenLM language model, which enabled better utilization of the additional text provided by the challenge. The proposed method achieved word error rates (WER) of 24.17%,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques
