persoDA: Personalized Data Augmentation for Personalized ASR

Pablo Peso Parada; Spyros Fontalis; Md Asif Jalal; Karthikeyan; Saravanan; Anastasios Drosou; Mete Ozay; Gil Ho Lee; Jungin Lee; Seokyeong; Jung

arXiv:2501.09113·eess.AS·January 20, 2025

persoDA: Personalized Data Augmentation for Personalized ASR

Pablo Peso Parada, Spyros Fontalis, Md Asif Jalal, Karthikeyan, Saravanan, Anastasios Drosou, Mete Ozay, Gil Ho Lee, Jungin Lee, Seokyeong, Jung

PDF

Open Access

TL;DR

This paper introduces persoDA, a personalized data augmentation method that uses user-specific data to improve speech recognition accuracy and convergence speed on mobile devices.

Contribution

The paper proposes persoDA, a novel data augmentation technique tailored to individual users, enhancing personalization and efficiency in ASR models.

Findings

01

13.9% relative WER reduction compared to standard augmentation

02

16% to 20% faster convergence with persoDA

03

Effective personalization of ASR models on mobile devices

Abstract

Data augmentation (DA) is ubiquitously used in training of Automatic Speech Recognition (ASR) models. DA offers increased data variability, robustness and generalization against different acoustic distortions. Recently, personalization of ASR models on mobile devices has been shown to improve Word Error Rate (WER). This paper evaluates data augmentation in this context and proposes persoDA; a DA method driven by user's data utilized to personalize ASR. persoDA aims to augment training with data specifically tuned towards acoustic characteristics of the end-user, as opposed to standard augmentation based on Multi-Condition Training (MCT) that applies random reverberation and noises. Our evaluation with an ASR conformer-based baseline trained on Librispeech and personalized for VOICES shows that persoDA achieves a 13.9% relative WER reduction over using standard data augmentation (using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsContext-Aware Activity Recognition Systems · Fault Detection and Control Systems