Generative error correction for code-switching speech recognition using   large language models

Chen Chen; Yuchen Hu; Chao-Han Huck Yang; Hexin Liu; Sabato Marco; Siniscalchi; Eng Siong Chng

arXiv:2310.13013·cs.CL·October 23, 2023·1 cites

Generative error correction for code-switching speech recognition using large language models

Chen Chen, Yuchen Hu, Chao-Han Huck Yang, Hexin Liu, Sabato Marco, Siniscalchi, Eng Siong Chng

PDF

Open Access

TL;DR

This paper introduces a generative error correction approach using large language models to improve code-switching speech recognition accuracy by leveraging multiple hypotheses and a trainable adapter, especially effective in low-resource scenarios.

Contribution

It proposes a novel generative error correction method with LLMs and a trainable adapter for CS-ASR, shifting from traditional rescoring techniques and addressing data scarcity.

Findings

01

Significant reduction in mixed error rate (MER) for CS-ASR.

02

LLMs demonstrate high data efficiency for hypotheses-to-transcription learning.

03

The method outperforms traditional rescoring approaches.

Abstract

Code-switching (CS) speech refers to the phenomenon of mixing two or more languages within the same sentence. Despite the recent advances in automatic speech recognition (ASR), CS-ASR is still a challenging task ought to the grammatical structure complexity of the phenomenon and the data scarcity of specific training corpus. In this work, we propose to leverage large language models (LLMs) and lists of hypotheses generated by an ASR to address the CS problem. Specifically, we first employ multiple well-trained ASR models for N-best hypotheses generation, with the aim of increasing the diverse and informative elements in the set of hypotheses. Next, we utilize the LLMs to learn the hypotheses-to-transcription (H2T) mapping by adding a trainable low-rank adapter. Such a generative error correction (GER) method directly predicts the accurate transcription according to its expert linguistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Speech and dialogue systems

MethodsSolana Customer Service Number +1-833-534-1729 · Sparse Evolutionary Training · Graph Convolutional Network · Gait Emotion Recognition