Augmenting Polish Automatic Speech Recognition System With Synthetic   Data

{\L}ukasz Bondaruk; Jakub Kubiak; Mateusz Czy\.znikiewicz

arXiv:2410.22903·eess.AS·October 31, 2024

Augmenting Polish Automatic Speech Recognition System With Synthetic Data

{\L}ukasz Bondaruk, Jakub Kubiak, Mateusz Czy\.znikiewicz

PDF

Open Access

TL;DR

This paper demonstrates that augmenting Polish speech recognition models with synthetic data generated by a Voicebox-based system significantly improves their performance, with results validated in the Poleval 2024 challenge.

Contribution

The paper introduces a synthetic data augmentation pipeline for Polish ASR using Voicebox, enhancing Conformer and Whisper models' accuracy.

Findings

01

Synthetic data improves model performance

02

Significant results in Poleval 2024 competition

03

Effective Voicebox-based speech synthesis pipeline

Abstract

This paper presents a system developed for submission to Poleval 2024, Task 3: Polish Automatic Speech Recognition Challenge. We describe Voicebox-based speech synthesis pipeline and utilize it to augment Conformer and Whisper speech recognition models with synthetic data. We show that addition of synthetic speech to training improves achieved results significantly. We also present final results achieved by our models in the competition.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Advanced Data Compression Techniques