HuBERT: Self-Supervised Speech Representation Learning by Masked   Prediction of Hidden Units

Wei-Ning Hsu; Benjamin Bolte; Yao-Hung Hubert Tsai; Kushal Lakhotia,; Ruslan Salakhutdinov; Abdelrahman Mohamed

arXiv:2106.07447·cs.CL·June 15, 2021

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units

Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia,, Ruslan Salakhutdinov, Abdelrahman Mohamed

PDF

5 Repos 10 Models

TL;DR

HuBERT introduces a self-supervised speech representation learning method that predicts masked hidden units using clustering, achieving state-of-the-art results on Librispeech benchmarks by combining acoustic and language modeling.

Contribution

The paper presents HuBERT, a novel approach that uses offline clustering and masked prediction to improve self-supervised speech representations, surpassing previous methods like wav2vec 2.0.

Findings

01

HuBERT matches or outperforms wav2vec 2.0 on Librispeech benchmarks.

02

Using clustering-based targets enhances speech representation learning.

03

HuBERT achieves up to 19% WER reduction on challenging evaluation subsets.

Abstract

Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training phase, and (3) sound units have variable lengths with no explicit segmentation. To deal with these three problems, we propose the Hidden-Unit BERT (HuBERT) approach for self-supervised speech representation learning, which utilizes an offline clustering step to provide aligned target labels for a BERT-like prediction loss. A key ingredient of our approach is applying the prediction loss over the masked regions only, which forces the model to learn a combined acoustic and language model over the continuous inputs. HuBERT relies primarily on the consistency of the unsupervised clustering step rather than the intrinsic quality of the assigned cluster labels.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Residual Connection · WordPiece · Attention Dropout · Dense Connections