Doctor or Patient? Synergizing Diarization and ASR for Code-Switched Hinglish Medical Conditions Extraction

S\'everin Baroudi; Yanis Labrak; Shashi Kumar; Joonas Kalda; Sergio Burdisso; Pawel Cyrta; Juan Ignacio Alvarez-Trejos; Petr Motlicek; Herv\'e Bredin; Ricard Marxer

arXiv:2603.06373·eess.AS·March 9, 2026

Doctor or Patient? Synergizing Diarization and ASR for Code-Switched Hinglish Medical Conditions Extraction

S\'everin Baroudi, Yanis Labrak, Shashi Kumar, Joonas Kalda, Sergio Burdisso, Pawel Cyrta, Juan Ignacio Alvarez-Trejos, Petr Motlicek, Herv\'e Bredin, Ricard Marxer

PDF

Open Access

TL;DR

This paper introduces a robust system combining diarization and ASR for extracting medical conditions from complex Hinglish clinical dialogues, achieving top performance in a challenging benchmark.

Contribution

It presents a novel end-to-end neural diarization approach and domain-adapted ASR for code-switched medical conversations, with comprehensive benchmarking against multimodal models.

Findings

01

Achieved 18.59% tcpWER with adapted ASR.

02

Outperformed other models in DISPLACE-M challenge.

03

Open cascade system was highly competitive.

Abstract

Extracting patient medical conditions from code-switched clinical spoken dialogues is challenging due to rapid turn-taking and highly overlapped speech. We present a robust system evaluated on the DISPLACE-M dataset of real-world Hinglish medical conversations. We propose an End-to-End Neural Diarization with Vector Clustering approach (EEND-VC) to accurately resolve dense and speaker overlaps in Doctor-Patient Conversations (DoPaCo). For transcription, we adapt a Qwen3 ASR model via domain-specific fine-tuning, Devanagari script normalization, and dialogue-level LLM error correction, achieving an 18.59% tcpWER. We benchmark open and proprietary LLMs on medical condition extraction, comparing our text-based cascade system against a multimodal End-to-End (E2E) audio framework. While proprietary E2E models set the performance ceiling, our open cascaded architecture is highly competitive,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Topic Modeling · Voice and Speech Disorders