Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech   Representation Learning

Guanrou Yang; Ziyang Ma; Zhisheng Zheng; Yakun Song; Zhikang Niu; Xie; Chen

arXiv:2309.13860·cs.CL·October 2, 2023

Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning

Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie, Chen

PDF

Open Access 1 Repo

TL;DR

Fast-HuBERT is an optimized self-supervised speech representation learning framework that significantly reduces training time by 5.2 times without sacrificing performance, enabling more efficient speech processing research.

Contribution

The paper introduces Fast-HuBERT, a set of efficiency optimizations for HuBERT that drastically reduce training time while maintaining accuracy.

Findings

01

Training time reduced by 5.2x on Librispeech 960h

02

Achieved 1.1 days training with 8 GPUs

03

Maintained performance levels comparable to original HuBERT

Abstract

Recent years have witnessed significant advancements in self-supervised learning (SSL) methods for speech-processing tasks. Various speech-based SSL models have been developed and present promising performance on a range of downstream tasks including speech recognition. However, existing speech-based SSL models face a common dilemma in terms of computational cost, which might hinder their potential application and in-depth academic research. To address this issue, we first analyze the computational cost of different modules during HuBERT pre-training and then introduce a stack of efficiency optimizations, which is named Fast-HuBERT in this paper. The proposed Fast-HuBERT can be trained in 1.1 days with 8 V100 GPUs on the Librispeech 960h benchmark, without performance degradation, resulting in a 5.2x speedup, compared to the original implementation. Moreover, we explore two well-studied…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yanghaha0908/fasthubert
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Topic Modeling