Enhancing ASR Performance in the Medical Domain for Dravidian Languages

Sri Charan Devarakonda; Ravi Sastry Kolluru; Manjula Sri Rayudu; Rashmi Kapoor; Madhu G; Anil Kumar Vuppala

arXiv:2604.19797·eess.AS·April 23, 2026

Enhancing ASR Performance in the Medical Domain for Dravidian Languages

Sri Charan Devarakonda, Ravi Sastry Kolluru, Manjula Sri Rayudu, Rashmi Kapoor, Madhu G, Anil Kumar Vuppala

PDF

TL;DR

This paper introduces a confidence-aware training framework that effectively improves medical domain ASR accuracy for low-resource Dravidian languages by integrating real and synthetic speech data with adaptive weighting strategies.

Contribution

It proposes a novel hybrid confidence mechanism with learnable weights for better utilization of heterogeneous data in low-resource medical ASR for Dravidian languages.

Findings

01

Significant reduction in Word Error Rate for Telugu and Kannada ASR systems.

02

Hybrid confidence-aware training outperforms standard fine-tuning baselines.

03

Incorporating language modeling further improves recognition accuracy.

Abstract

Automatic Speech Recognition (ASR) for low-resource Dravidian languages like Telugu and Kannada faces significant challenges in specialized medical domains due to limited annotated data and morphological complexity. This work proposes a novel confidence-aware training framework that integrates real and synthetic speech data through a hybrid confidence mechanism combining static perceptual and acoustic similarity metrics with dynamic model entropy. Unlike direct fine-tuning approaches, the proposed methodology employs both fixed-weight and learnable-weight confidence aggregation strategies to guide sample weighting during training, enabling effective utilization of heterogeneous data sources. The framework is evaluated on Telugu and Kannada medical datasets containing both real recordings and TTS-generated synthetic speech. A 5-gram KenLM language model is applied for post-decoding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.