Revisiting Interpolation Augmentation for Speech-to-Text Generation

Chen Xu; Jie Wang; Xiaoqian Liu; Qianqian Dong; Chunliang Zhang; Tong; Xiao; Jingbo Zhu; Dapeng Man; Wu Yang

arXiv:2406.15846·cs.CL·June 25, 2024

Revisiting Interpolation Augmentation for Speech-to-Text Generation

Chen Xu, Jie Wang, Xiaoqian Liu, Qianqian Dong, Chunliang Zhang, Tong, Xiao, Jingbo Zhu, Dapeng Man, Wu Yang

PDF

Open Access 1 Repo

TL;DR

This paper explores the use of interpolation augmentation in speech-to-text systems, demonstrating that proper implementation improves performance especially in low-resource scenarios, across various models and datasets.

Contribution

It provides a comprehensive analysis of interpolation augmentation's effectiveness in S2T tasks, which was previously under-explored, and offers guidelines for its optimal application.

Findings

01

Interpolation augmentation significantly improves S2T performance.

02

Effectiveness is consistent across different architectures and data scales.

03

Proper strategy selection is crucial for maximizing benefits.

Abstract

Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under-explored. In this paper, we delve into the utility of interpolation augmentation, guided by several pivotal questions. Our findings reveal that employing an appropriate strategy in interpolation augmentation significantly enhances performance across diverse tasks, architectures, and data scales, offering a promising avenue for more robust S2T systems in resource-constrained settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xuchennlp/s2t
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques