ResidualTransformer: Residual Low-Rank Learning with Weight-Sharing for Transformer Layers
Yiming Wang, Jinyu Li

TL;DR
ResidualTransformer introduces a weight-sharing and low-rank reparameterization method for Transformer layers, significantly reducing model size on speech tasks with minimal performance loss.
Contribution
The paper proposes a novel residual low-rank learning approach with weight-sharing for Transformer layers, inspired by ResNet and LoRA, to compress models efficiently.
Findings
Transformer encoder size reduced by ~3X
Achieved minimal performance degradation
Effective on large-scale speech tasks
Abstract
Memory constraint of always-on devices is one of the major concerns when deploying speech processing models on these devices. While larger models trained with sufficiently large amount of data generally perform better, making them fit in the device memory is a demanding challenge. In this paper, we aim to reduce model size by reparameterizing model weights across Transformer encoder layers and assuming a special weight composition and structure. More specifically, inspired by ResNet and the more recent LoRA work, we propose an approach named ResidualTransformer, where each weight matrix in a Transformer layer comprises 1) a shared full-rank component with its adjacent layers, and 2) a unique low-rank component to itself. The low-rank matrices only account for a small amount of model size increase. In addition, we add diagonal weight matrices to improve modeling capacity of the low-rank…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsMulti-Head Attention · Average Pooling · Kaiming Initialization · 1x1 Convolution · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Dense Connections · Linear Layer · Label Smoothing
