Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics
Kabir Kumar

TL;DR
This paper evaluates a two-stage system combining fine-tuned ASR and LLMs for medical diagnostics, emphasizing robustness to diverse audio conditions and improving automated patient support.
Contribution
It introduces a novel audio preprocessing strategy to enhance robustness and analyzes the effectiveness of combined ASR and LLM modules in medical call transcription and diagnosis.
Findings
Robustness improved through noise and clipping augmentation.
Effective transcription of diverse patient speech.
Accurate context-aware medical diagnosis from transcribed data.
Abstract
Natural Language Processing (NLP) and Voice Recognition agents are rapidly evolving healthcare by enabling efficient, accessible, and professional patient support while automating grunt work. This report serves as my self project wherein models finetuned on medical call recordings are analysed through a two-stage system: Automatic Speech Recognition (ASR) for speech transcription and a Large Language Model (LLM) for context-aware, professional responses. ASR, finetuned on phone call recordings provides generalised transcription of diverse patient speech over call, while the LLM matches transcribed text to medical diagnosis. A novel audio preprocessing strategy, is deployed to provide invariance to incoming recording/call data, laden with sufficient augmentation with noise/clipping to make the pipeline robust to the type of microphone and ambient conditions the patient might have while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis
