LLM-Based Data Generation and Clinical Skills Evaluation for Low-Resource French OSCEs
Tian Huang, Tom Bourgeade, Irina Illina

TL;DR
This paper explores using large language models to generate and evaluate French OSCE medical interviews, creating synthetic data and automatic assessments to improve low-resource training environments.
Contribution
It introduces a pipeline for generating and evaluating French OSCE dialogues with LLMs, enabling low-resource, privacy-preserving medical training tools.
Findings
Mid-size models achieve ~90% accuracy on synthetic data.
Synthetic dialogues can simulate varying student skill levels.
Open-source models are comparable to GPT-4o in evaluation accuracy.
Abstract
Objective Structured Clinical Examinations (OSCEs) are the standard method for assessing medical students' clinical and communication skills through structured patient interviews. In France, however, the organization of training sessions is limited by human and logistical constraints, restricting students' access to repeated practice and structured feedback. Recent advances in Natural Language Processing (NLP) and Large Language Models (LLMs) now offer the opportunity to automatically evaluate such medical interviews, thereby alleviating the need for human examiners during training. Yet, real French OSCE annotated transcripts remain extremely scarce, limiting reproducible research and reliable benchmarking. To address these challenges, we investigate the use of LLMs for both generating and evaluating French OSCE dialogues in a low-resource context. We introduce a controlled pipeline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
