Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores
Jiaming Zhou, Shiwan Zhao, Hui Wang, Tian-Hao Zhang, Haoqin Sun,, Xuechen Wang, Yong Qin

TL;DR
This paper introduces a novel kNN-CTC-based framework with gated datastore selection for zero-shot Chinese-English code-switching ASR, significantly improving performance by reducing noise from monolingual datastores.
Contribution
It proposes a dual datastore and gated selection mechanism to enhance multilingual ASR, addressing noise issues in existing kNN-CTC models for code-switching scenarios.
Findings
Gated datastore mechanism improves ASR accuracy.
Dual monolingual datastores reduce language interference.
Significant performance gains in zero-shot Chinese-English CS-ASR.
Abstract
The kNN-CTC model has proven to be effective for monolingual automatic speech recognition (ASR). However, its direct application to multilingual scenarios like code-switching, presents challenges. Although there is potential for performance improvement, a kNN-CTC model utilizing a single bilingual datastore can inadvertently introduce undesirable noise from the alternative language. To address this, we propose a novel kNN-CTC-based code-switching ASR (CS-ASR) framework that employs dual monolingual datastores and a gated datastore selection mechanism to reduce noise interference. Our method selects the appropriate datastore for decoding each frame, ensuring the injection of language-specific information into the ASR process. We apply this framework to cutting-edge CTC-based models, developing an advanced CS-ASR system. Extensive experiments demonstrate the remarkable effectiveness of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Network Packet Processing and Optimization
