Learning Invariant Representation and Risk Minimized for Unsupervised   Accent Domain Adaptation

Chendong Zhao; Jianzong Wang; Xiaoyang Qu; Haoqian Wang; Jing Xiao

arXiv:2210.08182·cs.SD·November 1, 2022

Learning Invariant Representation and Risk Minimized for Unsupervised Accent Domain Adaptation

Chendong Zhao, Jianzong Wang, Xiaoyang Qu, Haoqian Wang, Jing Xiao

PDF

Open Access

TL;DR

This paper proposes a method for learning domain-invariant speech representations that improve accent adaptation and recognition performance by directly mapping speech to high-level linguistic features.

Contribution

It introduces a novel approach for unsupervised learning of invariant speech representations that enhance adaptation to accented speech domains.

Findings

01

Learned representations capture articulatory features of phonemes.

02

Enhanced adaptation ability to accented speech domains.

03

Outperforms baseline methods on accented speech benchmarks.

Abstract

Unsupervised representation learning for speech audios attained impressive performances for speech recognition tasks, particularly when annotated speech is limited. However, the unsupervised paradigm needs to be carefully designed and little is known about what properties these representations acquire. There is no guarantee that the model learns meaningful representations for valuable information for recognition. Moreover, the adaptation ability of the learned representations to other domains still needs to be estimated. In this work, we explore learning domain-invariant representations via a direct mapping of speech representations to their corresponding high-level linguistic informations. Results prove that the learned latents not only capture the articulatory feature of each phoneme but also enhance the adaptation ability, outperforming the baseline largely on accented benchmarks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing