PriMock57: A Dataset Of Primary Care Mock Consultations
Alex Papadopoulos Korfiatis, Francesco Moramarco, Radmila Sarac,, Aleksandar Savkov

TL;DR
PriMock57 is a publicly available dataset of 57 simulated primary care consultations with audio, transcripts, and notes, aimed at advancing research in medical conversational ASR and note generation.
Contribution
This work introduces PriMock57, a novel high-quality dataset for clinical speech research, addressing privacy concerns and enabling benchmarking.
Findings
Dataset includes audio, transcripts, and notes for 57 consultations
Demonstrates use of dataset for benchmarking medical ASR and note generation
Facilitates research without compromising patient privacy
Abstract
Recent advances in Automatic Speech Recognition (ASR) have made it possible to reliably produce automatic transcripts of clinician-patient conversations. However, access to clinical datasets is heavily restricted due to patient privacy, thus slowing down normal research practices. We detail the development of a public access, high quality dataset comprising of57 mocked primary care consultations, including audio recordings, their manual utterance-level transcriptions, and the associated consultation notes. Our work illustrates how the dataset can be used as a benchmark for conversational medical ASR as well as consultation note generation from transcripts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Speech Recognition and Synthesis · Interpreting and Communication in Healthcare
