Towards Fair ASR For Second Language Speakers Using Fairness Prompted Finetuning

Monorama Swain; Bubai Maji; Jagabandhu Mishra; Markus Schedl; Anders S{\o}gaard; Jesper Rindom Jensen

arXiv:2510.18374·cs.CL·January 27, 2026

Towards Fair ASR For Second Language Speakers Using Fairness Prompted Finetuning

Monorama Swain, Bubai Maji, Jagabandhu Mishra, Markus Schedl, Anders S{\o}gaard, Jesper Rindom Jensen

PDF

Open Access

TL;DR

This paper proposes a fairness-aware finetuning method for English ASR systems that significantly reduces performance disparities across different accent groups for second-language speakers.

Contribution

It introduces a novel fairness-prompted finetuning approach combining spectral decoupling, group DRO, and IRM to improve fairness without sacrificing overall accuracy.

Findings

01

Achieves up to 58.7% relative WER reduction over baseline models.

02

Effectively reduces fairness gaps across 26 accent groups.

03

Maintains high overall recognition accuracy while improving fairness.

Abstract

In this work, we address the challenge of building fair English ASR systems for second-language speakers. Our analysis of widely used ASR models, Whisper and Seamless-M4T, reveals large fluctuations in word error rate (WER) across 26 accent groups, indicating significant fairness gaps. To mitigate this, we propose fairness-prompted finetuning with lightweight adapters, incorporating Spectral Decoupling (SD), Group Distributionally Robust Optimization (Group-DRO), and Invariant Risk Minimization (IRM). Our proposed fusion of traditional empirical risk minimization (ERM) with cross-entropy and fairness-driven objectives (SD, Group DRO, and IRM) enhances fairness across accent groups while maintaining overall recognition accuracy. In terms of macro-averaged word error rate, our approach achieves a relative improvement of 58.7% and 58.5% over the large pretrained Whisper and SeamlessM4T,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing