A Toolbox for Construction and Analysis of Speech Datasets

Evelina Bakhturina; Vitaly Lavrukhin; Boris Ginsburg

arXiv:2104.04896·eess.AS·January 10, 2022·1 cites

A Toolbox for Construction and Analysis of Speech Datasets

Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg

PDF

Open Access 1 Repo

TL;DR

This paper introduces an open-source toolbox for constructing and analyzing speech datasets, facilitating improved dataset quality and error analysis for speech recognition and synthesis systems.

Contribution

It presents the first open-source interactive tool for speech dataset exploration and demonstrates its application to Russian speech data and existing datasets.

Findings

01

The toolbox enables effective error analysis of speech datasets.

02

Application to Russian data shows practical utility.

03

Analysis of existing datasets highlights common issues.

Abstract

Automatic Speech Recognition and Text-to-Speech systems are primarily trained in a supervised fashion and require high-quality, accurately labeled speech datasets. In this work, we examine common problems with speech data and introduce a toolbox for the construction and interactive error analysis of speech datasets. The construction tool is based on K\"urzinger et al. work, and, to the best of our knowledge, the dataset exploration tool is the world's first open-source tool of this kind. We demonstrate how to apply these tools to create a Russian speech dataset and analyze existing speech datasets (Multilingual LibriSpeech, Mozilla Common Voice). The tools are open sourced as a part of the NeMo framework.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lumaku/ctc-segmentation
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing