Dropout Regularization for Self-Supervised Learning of Transformer   Encoder Speech Representation

Jian Luo; Jianzong Wang; Ning Cheng; Jing Xiao

arXiv:2107.04227·eess.AS·July 12, 2021

Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation

Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao

PDF

Open Access

TL;DR

This paper introduces two dropout regularization techniques for transformer-based self-supervised speech learning, which help prevent overfitting and improve downstream task performance.

Contribution

It proposes attention and layer dropout methods specifically designed for transformer speech models, enhancing their ability to utilize global information.

Findings

01

Improved phoneme classification accuracy.

02

Enhanced speaker recognition performance.

03

Dropout methods prevent overfitting in self-supervised learning.

Abstract

Predicting the altered acoustic frames is an effective way of self-supervised learning for speech representation. However, it is challenging to prevent the pretrained model from overfitting. In this paper, we proposed to introduce two dropout regularization methods into the pretraining of transformer encoder: (1) attention dropout, (2) layer dropout. Both of the two dropout methods encourage the model to utilize global speech information, and avoid just copying local spectrum features when reconstructing the masked frames. We evaluated the proposed methods on phoneme classification and speaker recognition tasks. The experiments demonstrate that our dropout approaches achieve competitive results, and improve the performance of classification accuracy on downstream tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

MethodsDropout