Contrastive Regularization for Accent-Robust ASR

Van-Phat Thai; Aradhya Dhruv; Duc-Thinh Pham; Sameer Alam

arXiv:2605.03297·cs.SD·May 6, 2026

Contrastive Regularization for Accent-Robust ASR

Van-Phat Thai, Aradhya Dhruv, Duc-Thinh Pham, Sameer Alam

PDF

TL;DR

This paper proposes supervised contrastive learning as a simple, effective regularization method to improve accent robustness in speech recognition systems, reducing error rates on unseen accents.

Contribution

It introduces a contrastive regularization technique for CTC fine-tuning that enhances accent invariance without changing model architecture or requiring explicit accent labels.

Findings

01

Achieves up to 29% relative WER reduction on unseen accents.

02

Promotes more compact and stable encoder representations.

03

Effective across multiple pretrained encoder models.

Abstract

ASR systems based on self-supervised acoustic pretraining and CTC fine-tuning achieve strong performance on native speech but remain sensitive to accent variability. We investigate supervised contrastive learning (SupCon) as a lightweight, accent-invariant auxiliary objective for CTC fine-tuning. An utterance-level contrastive loss regularizes encoder representations without architectural modification or explicit accent supervision. Experiments on the L2-ARCTIC benchmark show consistent WER reductions across multiple pretrained encoders, with up to 25 -- 29\% relative reduction under unseen-accent evaluation. Analysis using within-transcript cosine dispersion indicates that SupCon promotes more compact and stable representation geometry under accent variability. Overall, SupCon provides an effective and model-agnostic regularization strategy for improving accent robustness.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.