LibriConvo: Simulating Conversations from Read Literature for ASR and Diarization

M\'at\'e Gedeon; P\'eter Mihajlik

arXiv:2510.23320·eess.AS·October 28, 2025

LibriConvo: Simulating Conversations from Read Literature for ASR and Diarization

M\'at\'e Gedeon, P\'eter Mihajlik

PDF

2 Datasets

TL;DR

LibriConvo is a realistic, simulated multi-speaker conversational dataset designed to improve training and evaluation of speech recognition and diarization systems, featuring semantic coherence and natural timing.

Contribution

It introduces a novel pipeline for creating realistic multi-speaker conversations from read literature, enhancing acoustic realism and contextual consistency for speech processing research.

Findings

01

Sortformer outperforms pyannote in diarization.

02

Fast Conformer-CTC achieves 7.29% WER on LibriConvo.

03

Dataset contains 240.1 hours of dialogues with 830 speakers.

Abstract

We introduce LibriConvo, a simulated multi-speaker conversational dataset based on speaker-aware conversation simulation (SASC), designed to support training and evaluation of speaker diarization and automatic speech recognition (ASR) systems. Unlike prior resources that mostly rely on semantically disconnected utterances and implausible temporal gaps, LibriConvo ensures semantic coherence and realistic conversational timing. Our pipeline leverages CallHome with external VAD for reliable boundaries, applies compression to reduce unnaturally long silences, and organizes LibriTTS utterances by book to maintain contextual consistency. Acoustic realism is enhanced via a novel room impulse response selection procedure that ranks speaker-microphone configurations by spatial plausibility, balancing realism and diversity. The dataset comprises 240.1 hours across 1,496 dialogues with 830 unique…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.