AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech   Data

Jianwei Yu; Hangting Chen; Yanyao Bian; Xiang Li; Yi Luo; Jinchuan; Tian; Mengyang Liu; Jiayi Jiang; Shuai Wang

arXiv:2309.13905·eess.AS·September 26, 2023

AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data

Jianwei Yu, Hangting Chen, Yanyao Bian, Xiang Li, Yi Luo, Jinchuan, Tian, Mengyang Liu, Jiayi Jiang, Shuai Wang

PDF

Open Access

TL;DR

AutoPrep is an automated framework that preprocesses in-the-wild speech data to improve quality, generate labels, and transcriptions, facilitating better use of large-scale speech datasets for speech technology applications.

Contribution

The paper introduces AutoPrep, a novel automatic preprocessing framework for in-the-wild speech data, addressing noise, segmentation, and labeling challenges without manual annotation.

Findings

01

AutoPrep achieves comparable speech quality scores to open-source TTS datasets.

02

The framework enables high speaker similarity in TTS systems trained on preprocessed data.

03

AutoPrep effectively automates data cleaning and annotation for large-scale speech datasets.

Abstract

Recently, the utilization of extensive open-sourced text data has significantly advanced the performance of text-based large language models (LLMs). However, the use of in-the-wild large-scale speech data in the speech technology community remains constrained. One reason for this limitation is that a considerable amount of the publicly available speech data is compromised by background noise, speech overlapping, lack of speech segmentation information, missing speaker labels, and incomplete transcriptions, which can largely hinder their usefulness. On the other hand, human annotation of speech data is both time-consuming and costly. To address this issue, we introduce an automatic in-the-wild speech data preprocessing framework (AutoPrep) in this paper, which is designed to enhance speech quality, generate speaker labels, and produce transcriptions automatically. The proposed AutoPrep…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling