XCB: an effective contextual biasing approach to bias cross-lingual   phrases in speech recognition

Xucheng Wan; Naijun Zheng; Kai Liu; Huan Zhou

arXiv:2408.10524·cs.CL·August 21, 2024

XCB: an effective contextual biasing approach to bias cross-lingual phrases in speech recognition

Xucheng Wan, Naijun Zheng, Kai Liu, Huan Zhou

PDF

Open Access

TL;DR

This paper introduces XCB, a novel cross-lingual biasing method that improves code-switching speech recognition by enhancing recognition of secondary language phrases without extra inference costs.

Contribution

The study proposes a Cross-lingual Contextual Biasing (XCB) module that augments pre-trained ASR models for better bilingual phrase recognition in code-switching scenarios.

Findings

01

Significant improvement in recognizing secondary language phrases.

02

Effective on in-house and unseen test datasets.

03

No additional inference overhead.

Abstract

Contextualized ASR models have been demonstrated to effectively improve the recognition accuracy of uncommon phrases when a predefined phrase list is available. However, these models often struggle with bilingual settings, which are prevalent in code-switching speech recognition. In this study, we make the initial attempt to address this challenge by introducing a Cross-lingual Contextual Biasing(XCB) module. Specifically, we augment a pre-trained ASR model for the dominant language by integrating an auxiliary language biasing module and a supplementary language-specific loss, aimed at enhancing the recognition of phrases in the secondary language. Experimental results conducted on our in-house code-switching dataset have validated the efficacy of our approach, demonstrating significant improvements in the recognition of biasing phrases in the secondary language, even without any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis