Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training
Ramon Sanabria, Wei-Ning Hsu, Alexei Baevski, Michael Auli

TL;DR
This study investigates how individual domain factors like phonetics and syntax influence the effectiveness of self-supervised pre-training for speech recognition, revealing phonetic factors are most impactful.
Contribution
It provides a controlled analysis of individual domain factors in speech pre-training, highlighting the importance of phonetic aspects over syntactic or grammatical ones.
Findings
Phonetic factors significantly impact pre-training effectiveness.
Grammatical and syntactic factors are less influential.
First study to dissect domain factors in speech self-supervised pre-training.
Abstract
Human speech data comprises a rich set of domain factors such as accent, syntactic and semantic variety, or acoustic environment. Previous work explores the effect of domain mismatch in automatic speech recognition between pre-training and fine-tuning as a whole but does not dissect the contribution of individual factors. In this paper, we present a controlled study to better understand the effect of such factors on the performance of pre-trained representations on automatic speech recognition. To do so, we pre-train models either on modified natural speech or synthesized audio, with a single domain factor modified, and then measure performance after fine-tuning. Results show that phonetic domain factors play an important role during pre-training while grammatical and syntactic factors are far less important. To our knowledge, this is the first study to better understand the domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and dialogue systems
