Confidence Score Based Speaker Adaptation of Conformer Speech   Recognition Systems

Jiajun Deng; Xurong Xie; Tianzi Wang; Mingyu Cui; Boyang Xue; Zengrui; Jin; Guinan Li; Shujie Hu; Xunying Liu

arXiv:2302.07521·eess.AS·February 16, 2023

Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems

Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui, Jin, Guinan Li, Shujie Hu, Xunying Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a confidence score-based speaker adaptation method for Conformer speech recognition systems, improving accuracy by addressing data scarcity and supervision errors with Bayesian modeling and confidence estimation.

Contribution

It proposes a novel confidence score-based unsupervised speaker adaptation approach using Bayesian learning for data-efficient and robust Conformer ASR systems.

Findings

01

Significant WER reductions on Switchboard and AMI datasets.

02

Consistent performance improvements over baseline models.

03

Effective confidence score estimation modules enhance adaptation reliability.

Abstract

Speaker adaptation techniques provide a powerful solution to customise automatic speech recognition (ASR) systems for individual users. Practical application of unsupervised model-based speaker adaptation techniques to data intensive end-to-end ASR systems is hindered by the scarcity of speaker-level data and performance sensitivity to transcription errors. To address these issues, a set of compact and data efficient speaker-dependent (SD) parameter representations are used to facilitate both speaker adaptive training and test-time unsupervised speaker adaptation of state-of-the-art Conformer ASR systems. The sensitivity to supervision quality is reduced using a confidence score-based selection of the less erroneous subset of speaker-level adaptation data. Two lightweight confidence score estimation modules are proposed to produce more reliable confidence scores. The data sparsity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jjdean321/espnet_conformer_lhuc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Multi-Head Attention · Label Smoothing · Tanh Activation · Absolute Position Encodings · Adam · Sigmoid Activation · Position-Wise Feed-Forward Layer