Transsion TSUP's speech recognition system for ASRU 2023 MADASR   Challenge

Xiaoxiao Li; Gaosheng Zhang; An Zhu; Weiyong Li; Shuming Fang; Xiaoyue; Yang; Jianchao Zhu

arXiv:2307.11778·cs.CL·July 25, 2023

Transsion TSUP's speech recognition system for ASRU 2023 MADASR Challenge

Xiaoxiao Li, Gaosheng Zhang, An Zhu, Weiyong Li, Shuming Fang, Xiaoyue, Yang, Jianchao Zhu

PDF

Open Access

TL;DR

This paper describes Transsion TSUP's speech recognition system for the ASRU 2023 MADASR Challenge, focusing on low-resource Indian languages using advanced models and external language models to improve accuracy.

Contribution

The paper introduces a hybrid ASR system combining squeezeformer and transformer models with external language models, and fine-tunes pretrained IndicWhisper models for low-resource Indian languages.

Findings

01

Achieved WER of around 15-24% for Bengali and Bhojpuri across all tracks.

02

Demonstrated improved recognition accuracy with external KenLM language models.

03

Validated effectiveness of combining traditional and pretrained models for low-resource languages.

Abstract

This paper presents a speech recognition system developed by the Transsion Speech Understanding Processing Team (TSUP) for the ASRU 2023 MADASR Challenge. The system focuses on adapting ASR models for low-resource Indian languages and covers all four tracks of the challenge. For tracks 1 and 2, the acoustic model utilized a squeezeformer encoder and bidirectional transformer decoder with joint CTC-Attention training loss. Additionally, an external KenLM language model was used during TLG beam search decoding. For tracks 3 and 4, pretrained IndicWhisper models were employed and finetuned on both the challenge dataset and publicly available datasets. The whisper beam search decoding was also modified to support an external KenLM language model, which enabled better utilization of the additional text provided by the challenge. The proposed method achieved word error rates (WER) of 24.17%,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques