ASR Under the Stethoscope: Evaluating Biases in Clinical Speech Recognition across Indian Languages

Subham Kumar; Prakrithi Shivaprakash; Abhishek Manoharan; Astut Kurariya; Diptadhi Mukherjee; Lekhansh Shukla; Animesh Mukherjee; Prabhat Chand; Pratima Murthy

arXiv:2512.10967·cs.CL·December 15, 2025

ASR Under the Stethoscope: Evaluating Biases in Clinical Speech Recognition across Indian Languages

Subham Kumar, Prakrithi Shivaprakash, Abhishek Manoharan, Astut Kurariya, Diptadhi Mukherjee, Lekhansh Shukla, Animesh Mukherjee, Prabhat Chand, Pratima Murthy

PDF

Open Access

TL;DR

This paper systematically evaluates the performance and biases of various ASR models on clinical speech data in Indian languages, revealing significant disparities and highlighting the need for inclusive development in healthcare applications.

Contribution

It provides the first comprehensive multilingual benchmark and fairness analysis of ASR systems in Indian clinical settings, exposing performance gaps and biases.

Findings

01

Substantial variability in model performance across languages and speakers.

02

Systematic biases related to gender and speaker role identified.

03

Some models perform well on English but poorly on vernacular speech.

Abstract

Automatic Speech Recognition (ASR) is increasingly used to document clinical encounters, yet its reliability in multilingual and demographically diverse Indian healthcare contexts remains largely unknown. In this study, we conduct the first systematic audit of ASR performance on real world clinical interview data spanning Kannada, Hindi, and Indian English, comparing leading models including Indic Whisper, Whisper, Sarvam, Google speech to text, Gemma3n, Omnilingual, Vaani, and Gemini. We evaluate transcription accuracy across languages, speakers, and demographic subgroups, with a particular focus on error patterns affecting patients vs. clinicians and gender based or intersectional disparities. Our results reveal substantial variability across models and languages, with some systems performing competitively on Indian English but failing on code mixed or vernacular speech. We also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiology practices and education · Speech Recognition and Synthesis · Artificial Intelligence in Healthcare and Education