Benchmarking Speech Systems for Frontline Health Conversations: The DISPLACE-M Challenge

Dhanya E; Ankita Meena; Manas Nanivadekar; Noumida A; Victor Azad; Ashwini Nagaraj Shenoy; Pratik Roy Chowdhuri; Shobhit Banga; Vanshika Chhabra; Chitralekha Bhat; Shareef babu Kalluri; Srikanth Raj Chetupalli; Deepu Vijayasenan; Sriram Ganapathy

arXiv:2603.02813·eess.AS·March 6, 2026

Benchmarking Speech Systems for Frontline Health Conversations: The DISPLACE-M Challenge

Dhanya E, Ankita Meena, Manas Nanivadekar, Noumida A, Victor Azad, Ashwini Nagaraj Shenoy, Pratik Roy Chowdhuri, Shobhit Banga, Vanshika Chhabra, Chitralekha Bhat, Shareef babu Kalluri, Srikanth Raj Chetupalli, Deepu Vijayasenan, Sriram Ganapathy

PDF

Open Access

TL;DR

This paper presents the DISPLACE-M challenge, a benchmark for evaluating speech processing systems in real-world medical conversations involving multiple speakers, spontaneous speech, and noise, with datasets and baseline systems.

Contribution

It introduces a new medical conversational dataset, defines four benchmark tasks, and provides baseline systems for evaluating speech processing in medical dialogue scenarios.

Findings

01

Baseline systems achieved measurable performance on all tasks.

02

Evaluation results highlight challenges in diarization and ASR in medical conversations.

03

The dataset and benchmarks facilitate future research in medical conversational AI.

Abstract

The DIarization and Speech Processing for LAnguage understanding in Conversational Environments - Medical (DISPLACE-M) challenge introduces a conversational AI benchmark for understanding goal-oriented, real-world medical dialogues. The challenge addresses multi-speaker interactions between frontline health workers and care seekers, characterized by spontaneous, noisy and overlapping speech. As part of the challenge, medical conversational dataset comprising 40 hours of development and 15 hours of blind evaluation recordings was released. We provided baseline systems across 4 tasks - speaker diarization, automatic speech recognition, topic identification and dialogue summarization - to enable consistent benchmarking. System performance is evaluated using diarization error rate (DER), time-constrained minimum-permutation word error rate (tcpWER) and ROUGE-L. This paper describes the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare and Education · Topic Modeling