persoDA: Personalized Data Augmentation for Personalized ASR
Pablo Peso Parada, Spyros Fontalis, Md Asif Jalal, Karthikeyan, Saravanan, Anastasios Drosou, Mete Ozay, Gil Ho Lee, Jungin Lee, Seokyeong, Jung

TL;DR
This paper introduces persoDA, a personalized data augmentation method that uses user-specific data to improve speech recognition accuracy and convergence speed on mobile devices.
Contribution
The paper proposes persoDA, a novel data augmentation technique tailored to individual users, enhancing personalization and efficiency in ASR models.
Findings
13.9% relative WER reduction compared to standard augmentation
16% to 20% faster convergence with persoDA
Effective personalization of ASR models on mobile devices
Abstract
Data augmentation (DA) is ubiquitously used in training of Automatic Speech Recognition (ASR) models. DA offers increased data variability, robustness and generalization against different acoustic distortions. Recently, personalization of ASR models on mobile devices has been shown to improve Word Error Rate (WER). This paper evaluates data augmentation in this context and proposes persoDA; a DA method driven by user's data utilized to personalize ASR. persoDA aims to augment training with data specifically tuned towards acoustic characteristics of the end-user, as opposed to standard augmentation based on Multi-Condition Training (MCT) that applies random reverberation and noises. Our evaluation with an ASR conformer-based baseline trained on Librispeech and personalized for VOICES shows that persoDA achieves a 13.9% relative WER reduction over using standard data augmentation (using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsContext-Aware Activity Recognition Systems · Fault Detection and Control Systems
