Layer Reduction: Accelerating Conformer-Based Self-Supervised Model via Layer Consistency
Jinchuan Tian, Rongzhi Gu, Helin Wang, Yuexian Zou

TL;DR
This paper introduces Layer Consistency to enable efficient training and inference of Conformer-based self-supervised speech models, achieving significant speedups and parameter reduction without performance loss.
Contribution
It proposes Layer Consistency to allow layer sampling and shallow inference, reducing computation and parameters in Transformer-based speech models.
Findings
7.8X parameter reduction
41.9% training speedup
37.7% inference speedup
Abstract
Transformer-based self-supervised models are trained as feature extractors and have empowered many downstream speech tasks to achieve state-of-the-art performance. However, both the training and inference process of these models may encounter prohibitively high computational cost and large parameter budget. Although Parameter Sharing Strategy (PSS) proposed in ALBERT paves the way for parameter reduction, the computation required remains the same. Interestingly, we found in experiments that distributions of feature embeddings from different Transformer layers are similar when PSS is integrated: a property termed as Layer Consistency (LC) in this paper. Given this similarity of feature distributions, we assume that feature embeddings from different layers would have similar representing power. In this work, Layer Consistency enables us to adopt Transformer-based models in a more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Text and Document Classification Technologies
MethodsMulti-Head Attention · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Label Smoothing · Byte Pair Encoding · Dropout · Softmax
