A Toolbox for Construction and Analysis of Speech Datasets
Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg

TL;DR
This paper introduces an open-source toolbox for constructing and analyzing speech datasets, facilitating improved dataset quality and error analysis for speech recognition and synthesis systems.
Contribution
It presents the first open-source interactive tool for speech dataset exploration and demonstrates its application to Russian speech data and existing datasets.
Findings
The toolbox enables effective error analysis of speech datasets.
Application to Russian data shows practical utility.
Analysis of existing datasets highlights common issues.
Abstract
Automatic Speech Recognition and Text-to-Speech systems are primarily trained in a supervised fashion and require high-quality, accurately labeled speech datasets. In this work, we examine common problems with speech data and introduce a toolbox for the construction and interactive error analysis of speech datasets. The construction tool is based on K\"urzinger et al. work, and, to the best of our knowledge, the dataset exploration tool is the world's first open-source tool of this kind. We demonstrate how to apply these tools to create a Russian speech dataset and analyze existing speech datasets (Multilingual LibriSpeech, Mozilla Common Voice). The tools are open sourced as a part of the NeMo framework.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing
