FASA: a Flexible and Automatic Speech Aligner for Extracting   High-quality Aligned Children Speech Data

Dancheng Liu; Jinjun Xiong

arXiv:2406.17926·cs.CL·June 27, 2024

FASA: a Flexible and Automatic Speech Aligner for Extracting High-quality Aligned Children Speech Data

Dancheng Liu, Jinjun Xiong

PDF

Open Access 1 Repo

TL;DR

FASA is a new automatic speech aligner designed to extract high-quality aligned children's speech data from noisy datasets, significantly improving data quality and aiding children's ASR development.

Contribution

The paper introduces FASA, a flexible and automatic forced-alignment tool specifically tailored for children's speech, addressing limitations of existing tools and enhancing data quality.

Findings

01

FASA improves data quality by 13.6 times over human annotations.

02

FASA effectively extracts high-quality aligned children's speech data from noisy datasets.

03

Application on CHILDES dataset demonstrates FASA's practical utility.

Abstract

Automatic Speech Recognition (ASR) for adults' speeches has made significant progress by employing deep neural network (DNN) models recently, but improvement in children's speech is still unsatisfactory due to children's speech's distinct characteristics. DNN models pre-trained on adult data often struggle in generalizing children's speeches with fine tuning because of the lack of high-quality aligned children's speeches. When generating datasets, human annotations are not scalable, and existing forced-alignment tools are not usable as they make impractical assumptions about the quality of the input transcriptions. To address these challenges, we propose a new forced-alignment tool, FASA, as a flexible and automatic speech aligner to extract high-quality aligned children's speech data from many of the existing noisy children's speech data. We demonstrate its usage on the CHILDES dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DanchengLiu/FASA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Speech and Audio Processing