ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR
Vishwanath Pratap Singh, Federico Malato, Ville Hautamaki, Md., Sahidullah, Tomi Kinnunen

TL;DR
This paper introduces a reinforcement learning approach using a deep Q-network to dynamically adjust the ratio of original to augmented data during wav2vec2.0 based ASR training, improving performance over fixed ratios.
Contribution
It proposes a novel RL-based method to optimize data augmentation ratios dynamically in ASR training, moving beyond heuristic fixed ratios.
Findings
Achieves an average of 4.96% relative improvement on LibriSpeech test sets.
Demonstrates effectiveness across various training data sizes.
Validates the RL approach as superior to fixed OAR methods.
Abstract
While automatic speech recognition (ASR) greatly benefits from data augmentation, the augmentation recipes themselves tend to be heuristic. In this paper, we address one of the heuristic approach associated with balancing the right amount of augmented data in ASR training by introducing a reinforcement learning (RL) based dynamic adjustment of original-to-augmented data ratio (OAR). Unlike the fixed OAR approach in conventional data augmentation, our proposed method employs a deep Q-network (DQN) as the RL mechanism to learn the optimal dynamics of OAR throughout the wav2vec2.0 based ASR training. We conduct experiments using the LibriSpeech dataset with varying amounts of training data, specifically, the 10Min, 1H, 10H, and 100H splits to evaluate the efficacy of the proposed method under different data conditions. Our proposed method, on average, achieves a relative improvement of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging Techniques and Applications · Fault Detection and Control Systems · Advanced X-ray and CT Imaging
MethodsBalanced Selection
