Layer Reduction: Accelerating Conformer-Based Self-Supervised Model via   Layer Consistency

Jinchuan Tian; Rongzhi Gu; Helin Wang; Yuexian Zou

arXiv:2105.00812·cs.CL·May 4, 2021

Layer Reduction: Accelerating Conformer-Based Self-Supervised Model via Layer Consistency

Jinchuan Tian, Rongzhi Gu, Helin Wang, Yuexian Zou

PDF

Open Access

TL;DR

This paper introduces Layer Consistency to enable efficient training and inference of Conformer-based self-supervised speech models, achieving significant speedups and parameter reduction without performance loss.

Contribution

It proposes Layer Consistency to allow layer sampling and shallow inference, reducing computation and parameters in Transformer-based speech models.

Findings

01

7.8X parameter reduction

02

41.9% training speedup

03

37.7% inference speedup

Abstract

Transformer-based self-supervised models are trained as feature extractors and have empowered many downstream speech tasks to achieve state-of-the-art performance. However, both the training and inference process of these models may encounter prohibitively high computational cost and large parameter budget. Although Parameter Sharing Strategy (PSS) proposed in ALBERT paves the way for parameter reduction, the computation required remains the same. Interestingly, we found in experiments that distributions of feature embeddings from different Transformer layers are similar when PSS is integrated: a property termed as Layer Consistency (LC) in this paper. Given this similarity of feature distributions, we assume that feature embeddings from different layers would have similar representing power. In this work, Layer Consistency enables us to adopt Transformer-based models in a more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Text and Document Classification Technologies

MethodsMulti-Head Attention · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Label Smoothing · Byte Pair Encoding · Dropout · Softmax