ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for   Wav2Vec2.0 Based ASR

Vishwanath Pratap Singh; Federico Malato; Ville Hautamaki; Md.; Sahidullah; Tomi Kinnunen

arXiv:2406.09999·eess.AS·June 17, 2024·Interspeech

ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR

Vishwanath Pratap Singh, Federico Malato, Ville Hautamaki, Md., Sahidullah, Tomi Kinnunen

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning approach using a deep Q-network to dynamically adjust the ratio of original to augmented data during wav2vec2.0 based ASR training, improving performance over fixed ratios.

Contribution

It proposes a novel RL-based method to optimize data augmentation ratios dynamically in ASR training, moving beyond heuristic fixed ratios.

Findings

01

Achieves an average of 4.96% relative improvement on LibriSpeech test sets.

02

Demonstrates effectiveness across various training data sizes.

03

Validates the RL approach as superior to fixed OAR methods.

Abstract

While automatic speech recognition (ASR) greatly benefits from data augmentation, the augmentation recipes themselves tend to be heuristic. In this paper, we address one of the heuristic approach associated with balancing the right amount of augmented data in ASR training by introducing a reinforcement learning (RL) based dynamic adjustment of original-to-augmented data ratio (OAR). Unlike the fixed OAR approach in conventional data augmentation, our proposed method employs a deep Q-network (DQN) as the RL mechanism to learn the optimal dynamics of OAR throughout the wav2vec2.0 based ASR training. We conduct experiments using the LibriSpeech dataset with varying amounts of training data, specifically, the 10Min, 1H, 10H, and 100H splits to evaluate the efficacy of the proposed method under different data conditions. Our proposed method, on average, achieves a relative improvement of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Imaging Techniques and Applications · Fault Detection and Control Systems · Advanced X-ray and CT Imaging

MethodsBalanced Selection