The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR   Challenge

Yuhao Liang; Peikun Chen; Fan Yu; Xinfa Zhu; Tianyi Xu; Lei Xie

arXiv:2210.14448·cs.SD·October 27, 2022

The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge

Yuhao Liang, Peikun Chen, Fan Yu, Xinfa Zhu, Tianyi Xu, Lei Xie

PDF

Open Access

TL;DR

This paper presents a comprehensive NPU-ASLP system for the ISCSLP 2022 Magichub Code-Switching ASR Challenge, exploring multiple architectures, language models, data augmentation techniques, and hypothesis fusion to achieve top performance.

Contribution

The paper introduces a multi-faceted ASR system combining various architectures, language models, and data augmentation methods, with effective hypothesis fusion, for improved code-switching speech recognition.

Findings

01

Achieved 16.87% MER on test set

02

Utilized diverse architectures and training strategies

03

Effective data augmentation and hypothesis fusion

Abstract

This paper describes our NPU-ASLP system submitted to the ISCSLP 2022 Magichub Code-Switching ASR Challenge. In this challenge, we first explore several popular end-to-end ASR architectures and training strategies, including bi-encoder, language-aware encoder (LAE) and mixture of experts (MoE). To improve our system's language modeling ability, we further attempt the internal language model as well as the long context language model. Given the limited training data in the challenge, we further investigate the effects of data augmentation, including speed perturbation, pitch shifting, speech codec, SpecAugment and synthetic data from text-to-speech (TTS). Finally, we explore ROVER-based score fusion to make full use of complementary hypotheses from different models. Our submitted system achieves 16.87% on mix error rate (MER) on the test set and comes to the 2nd place in the challenge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems