Reducing Language confusion for Code-switching Speech Recognition with   Token-level Language Diarization

Hexin Liu; Haihua Xu; Leibny Paola Garcia; Andy W. H. Khong; Yi He,; Sanjeev Khudanpur

arXiv:2210.14567·eess.AS·October 27, 2022·1 cites

Reducing Language confusion for Code-switching Speech Recognition with Token-level Language Diarization

Hexin Liu, Haihua Xu, Leibny Paola Garcia, Andy W. H. Khong, Yi He,, Sanjeev Khudanpur

PDF

Open Access 1 Repo

TL;DR

This paper proposes methods to reduce language confusion in code-switching speech recognition by incorporating token-level language information and disentangling language features, leading to improved recognition accuracy.

Contribution

It introduces a novel approach combining language posterior bias and adversarial disentangling to enhance code-switching speech recognition performance.

Findings

01

Language posterior bias outperforms disentangling in reducing confusion.

02

Joint optimization with language diarization improves recognition accuracy.

03

Incorporating language information is more effective than disentangling.

Abstract

Code-switching (CS) refers to the phenomenon that languages switch within a speech signal and leads to language confusion for automatic speech recognition (ASR). This paper aims to address language confusion for improving CS-ASR from two perspectives: incorporating and disentangling language information. We incorporate language information in the CS-ASR model by dynamically biasing the model with token-level language posteriors which are outputs of a sequence-to-sequence auxiliary language diarization module. In contrast, the disentangling process reduces the difference between languages via adversarial training so as to normalize two languages. We conduct the experiments on the SEAME dataset. Compared to the baseline model, both the joint optimization with LD and the language posterior bias achieve performance improvement. The comparison of the proposed methods indicates that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Lhx94As/reducing_language_confusion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing