Benchmarking Automatic Speech Recognition coupled LLM Modules for   Medical Diagnostics

Kabir Kumar

arXiv:2502.13982·eess.AS·February 21, 2025

Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics

Kabir Kumar

PDF

Open Access

TL;DR

This paper evaluates a two-stage system combining fine-tuned ASR and LLMs for medical diagnostics, emphasizing robustness to diverse audio conditions and improving automated patient support.

Contribution

It introduces a novel audio preprocessing strategy to enhance robustness and analyzes the effectiveness of combined ASR and LLM modules in medical call transcription and diagnosis.

Findings

01

Robustness improved through noise and clipping augmentation.

02

Effective transcription of diverse patient speech.

03

Accurate context-aware medical diagnosis from transcribed data.

Abstract

Natural Language Processing (NLP) and Voice Recognition agents are rapidly evolving healthcare by enabling efficient, accessible, and professional patient support while automating grunt work. This report serves as my self project wherein models finetuned on medical call recordings are analysed through a two-stage system: Automatic Speech Recognition (ASR) for speech transcription and a Large Language Model (LLM) for context-aware, professional responses. ASR, finetuned on phone call recordings provides generalised transcription of diverse patient speech over call, while the LLM matches transcribed text to medical diagnosis. A novel audio preprocessing strategy, is deployed to provide invariance to incoming recording/call data, laden with sufficient augmentation with noise/clipping to make the pipeline robust to the type of microphone and ambient conditions the patient might have while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis