Enhancing ASR Performance in the Medical Domain for Dravidian Languages
Sri Charan Devarakonda, Ravi Sastry Kolluru, Manjula Sri Rayudu, Rashmi Kapoor, Madhu G, Anil Kumar Vuppala

TL;DR
This paper introduces a confidence-aware training framework that effectively improves medical domain ASR accuracy for low-resource Dravidian languages by integrating real and synthetic speech data with adaptive weighting strategies.
Contribution
It proposes a novel hybrid confidence mechanism with learnable weights for better utilization of heterogeneous data in low-resource medical ASR for Dravidian languages.
Findings
Significant reduction in Word Error Rate for Telugu and Kannada ASR systems.
Hybrid confidence-aware training outperforms standard fine-tuning baselines.
Incorporating language modeling further improves recognition accuracy.
Abstract
Automatic Speech Recognition (ASR) for low-resource Dravidian languages like Telugu and Kannada faces significant challenges in specialized medical domains due to limited annotated data and morphological complexity. This work proposes a novel confidence-aware training framework that integrates real and synthetic speech data through a hybrid confidence mechanism combining static perceptual and acoustic similarity metrics with dynamic model entropy. Unlike direct fine-tuning approaches, the proposed methodology employs both fixed-weight and learnable-weight confidence aggregation strategies to guide sample weighting during training, enabling effective utilization of heterogeneous data sources. The framework is evaluated on Telugu and Kannada medical datasets containing both real recordings and TTS-generated synthetic speech. A 5-gram KenLM language model is applied for post-decoding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
