FeruzaSpeech: A 60 Hour Uzbek Read Speech Corpus with Punctuation, Casing, and Context
Anna Povey, Katherine Povey

TL;DR
FeruzaSpeech is a comprehensive 60-hour Uzbek read speech corpus with transcripts in Cyrillic and Latin, designed to improve speech recognition systems and freely accessible for research.
Contribution
The paper introduces FeruzaSpeech, a new high-quality Uzbek speech corpus with diverse content and dual-script transcripts, enhancing speech recognition performance.
Findings
Improved Word Error Rates on Uzbek speech datasets
High-quality recordings from a native speaker
Availability of the corpus for academic research
Abstract
This paper introduces FeruzaSpeech, a read speech corpus of the Uzbek language, containing transcripts in both Cyrillic and Latin alphabets, freely available for academic research purposes. This corpus includes 60 hours of high-quality recordings from a single native female speaker from Tashkent, Uzbekistan. These recordings consist of short excerpts from a book and BBC News. This paper discusses the enhancement of the Word Error Rates (WERs) on CommonVoice 16.1's Uzbek data, Uzbek Speech Corpus data, and FeruzaSpeech data upon integrating FeruzaSpeech.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
