Low-Resource Self-Supervised Learning with SSL-Enhanced TTS

Po-chun Hsu; Ali Elkahky; Wei-Ning Hsu; Yossi Adi; Tu Anh Nguyen; Jade; Copet; Emmanuel Dupoux; Hung-yi Lee; Abdelrahman Mohamed

arXiv:2309.17020·eess.AS·June 5, 2024·1 cites

Low-Resource Self-Supervised Learning with SSL-Enhanced TTS

Po-chun Hsu, Ali Elkahky, Wei-Ning Hsu, Yossi Adi, Tu Anh Nguyen, Jade, Copet, Emmanuel Dupoux, Hung-yi Lee, Abdelrahman Mohamed

PDF

Open Access

TL;DR

This paper introduces a method that uses synthetic speech generated by a TTS system to significantly reduce the amount of real speech data needed for effective self-supervised learning in speech processing, achieving high performance with minimal data.

Contribution

It presents a novel approach that leverages SSL-enhanced TTS to augment low-resource pre-training datasets, substantially reducing data requirements in speech SSL tasks.

Findings

01

Reduces speech data needs by 90%

02

Maintains performance with minimal data

03

First to enhance low-resource SSL with synthetic speech

Abstract

Self-supervised learning (SSL) techniques have achieved remarkable results in various speech processing tasks. Nonetheless, a significant challenge remains in reducing the reliance on vast amounts of speech data for pre-training. This paper proposes to address this challenge by leveraging synthetic speech to augment a low-resource pre-training corpus. We construct a high-quality text-to-speech (TTS) system with limited resources using SSL features and generate a large synthetic corpus for pre-training. Experimental results demonstrate that our proposed approach effectively reduces the demand for speech data by 90% with only slight performance degradation. To the best of our knowledge, this is the first work aiming to enhance low-resource self-supervised learning in speech processing.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling