Iterative LLM-based improvement for French Clinical Interview Transcription and Speaker Diarization

Ambre Marie (LaTIM); Thomas Bertin (DySoLab); Guillaume Dardenne (LaTIM); Gwenol\'e Quellec (LaTIM)

arXiv:2603.00086·cs.CL·May 21, 2026

Iterative LLM-based improvement for French Clinical Interview Transcription and Speaker Diarization

Ambre Marie (LaTIM), Thomas Bertin (DySoLab), Guillaume Dardenne (LaTIM), Gwenol\'e Quellec (LaTIM)

PDF

TL;DR

This paper introduces an iterative multi-pass LLM-based approach to improve French clinical speech transcription and speaker diarization, demonstrating significant accuracy gains and stability in medical conversation datasets.

Contribution

It presents a novel multi-pass LLM post-processing architecture with ablation studies, optimizing design choices for clinical speech transcription.

Findings

01

Significant reduction in word error rate on suicide prevention conversations

02

Stable performance on neurosurgery consultations

03

Zero output failures with acceptable computational cost

Abstract

Automatic speech recognition for French medical conversations remains challenging, with word error rates often exceeding 30% in spontaneous clinical speech. This study proposes a multi-pass LLM post-processing architecture alternating between Speaker Recognition and Word Recognition passes to improve transcription accuracy and speaker attribution. Ablation studies on two French clinical datasets (suicide prevention telephone counseling and preoperative awake neurosurgery consultations) investigate four design choices: model selection, prompting strategy, pass ordering, and iteration depth. Using Qwen3-Next-80B, Wilcoxon signed-rank tests confirm significant WDER reductions on suicide prevention conversations (p<0.05, n=18), while maintaining stability on awake neurosurgery consultations (n=10), with zero output failures and acceptable computational cost (RTF 0.32), suggesting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Emotion and Mood Recognition