Loquacious Set: 25,000 Hours of Transcribed and Diverse English Speech Recognition Data for Research and Commercial Use

Titouan Parcollet; Yuan Tseng; Shucong Zhang; Rogier van Dalen

arXiv:2505.21578·cs.CL·May 29, 2025

Loquacious Set: 25,000 Hours of Transcribed and Diverse English Speech Recognition Data for Research and Commercial Use

Titouan Parcollet, Yuan Tseng, Shucong Zhang, Rogier van Dalen

PDF

Open Access 1 Datasets

TL;DR

The Loquacious Set is a large, diverse, and commercially usable English speech dataset of 25,000 hours, aimed at advancing ASR research with real-world speech variability.

Contribution

It introduces a new extensive speech dataset that overcomes limitations of previous datasets, supporting both academic and industrial ASR development.

Findings

01

Contains 25,000 hours of diverse English speech

02

Includes hundreds of thousands of speakers with various accents

03

Designed for real-world ASR research and commercial applications

Abstract

Automatic speech recognition (ASR) research is driven by the availability of common datasets between industrial researchers and academics, encouraging comparisons and evaluations. LibriSpeech, despite its long success as an ASR benchmark, is now limited by its size and focus on clean, read speech, leading to near-zero word error rates. More recent datasets, including MOSEL, YODAS, Gigaspeech, OWSM, Libriheavy or People's Speech suffer from major limitations including licenses that researchers in the industry cannot use, unreliable transcriptions, incorrect audio data, or the lack of evaluation sets. This work presents the Loquacious Set, a 25,000-hour curated collection of commercially usable English speech. Featuring hundreds of thousands of speakers with diverse accents and a wide range of speech types (read, spontaneous, talks, clean, noisy), the Loquacious Set is designed to work…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

speechbrain/LoquaciousSet
dataset· 9.6k dl
9.6k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Emotion and Mood Recognition

MethodsFocus · Sparse Evolutionary Training