TL;DR
This paper introduces IndicMedDialog, a multilingual multi-turn medical dialogue dataset for nine Indic languages, and fine-tunes a language model for accessible healthcare communication.
Contribution
It presents a new parallel multilingual medical dialogue dataset and a fine-tuned language model tailored for multi-turn healthcare conversations in Indic languages.
Findings
The dataset includes synthetic and verified dialogues in nine languages.
The fine-tuned model improves multilingual medical dialogue generation.
Expert evaluation confirms clinical plausibility of the system.
Abstract
Most existing medical dialogue systems operate in a single-turn question--answering paradigm or rely on template-based datasets, limiting conversational realism and multilingual applicability. We introduce IndicMedDialog, a parallel multi-turn medical dialogue dataset spanning English and nine Indic languages: Assamese, Bengali, Gujarati, Hindi, Marathi, Punjabi, Tamil, Telugu, and Urdu. The dataset extends MDDial with LLM-generated synthetic consultations, translated using TranslateGemma, verified by native speakers, and refined through a script-aware post-processing pipeline to correct phonetic, lexical, and character-spacing errors. Building on this dataset, we fine-tune IndicMedLM via parameter-efficient adaptation of a quantized small language model, incorporating optional patient pre-context to personalise multi-turn symptom elicitation. We evaluate against zero-shot multilingual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
