Don't Stop Self-Supervision: Accent Adaptation of Speech Representations   via Residual Adapters

Anshu Bhatia; Sanchit Sinha; Saket Dingliwal; Karthik Gopalakrishnan,; Sravan Bodapati; Katrin Kirchhoff

arXiv:2307.00453·cs.CL·July 4, 2023

Don't Stop Self-Supervision: Accent Adaptation of Speech Representations via Residual Adapters

Anshu Bhatia, Sanchit Sinha, Saket Dingliwal, Karthik Gopalakrishnan,, Sravan Bodapati, Katrin Kirchhoff

PDF

Open Access

TL;DR

This paper introduces a parameter-efficient method for adapting self-supervised speech representations to non-native accents using residual adapters, significantly improving ASR performance across multiple accents.

Contribution

It proposes a novel, model- and task-agnostic approach for accent adaptation of speech models via residual adapters, enhancing robustness to accented speech.

Findings

01

Achieved an average 22.7% WERR with accent-specific adapters.

02

Improved WERR to 25.1% when adapting the entire encoder.

03

Method is effective across four different accents.

Abstract

Speech representations learned in a self-supervised fashion from massive unlabeled speech corpora have been adapted successfully toward several downstream tasks. However, such representations may be skewed toward canonical data characteristics of such corpora and perform poorly on atypical, non-native accented speaker populations. With the state-of-the-art HuBERT model as a baseline, we propose and investigate self-supervised adaptation of speech representations to such populations in a parameter-efficient way via training accent-specific residual adapters. We experiment with 4 accents and choose automatic speech recognition (ASR) as the downstream task of interest. We obtain strong word error rate reductions (WERR) over HuBERT-large for all 4 accents, with a mean WERR of 22.7% with accent-specific adapters and a mean WERR of 25.1% if the entire encoder is accent-adapted. While our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques