Adaptive Dropout for Pruning Conformers
Yotaro Kubo, Xingyu Cai, Michiel Bacchiani

TL;DR
This paper introduces an adaptive dropout-based pruning method for Conformers that reduces parameters by over 50% while improving speech recognition accuracy, through joint training and pruning at multiple network points.
Contribution
The paper presents a novel adaptive dropout technique using unit-wise retention probabilities and Gumbel-Softmax for effective joint training and pruning of Conformer models.
Findings
Achieved 54% parameter reduction in Conformers.
Improved word error rate by approximately 1%.
Demonstrated effectiveness on LibriSpeech speech recognition task.
Abstract
This paper proposes a method to effectively perform joint training-and-pruning based on adaptive dropout layers with unit-wise retention probabilities. The proposed method is based on the estimation of a unit-wise retention probability in a dropout layer. A unit that is estimated to have a small retention probability can be considered to be prunable. The retention probability of the unit is estimated using back-propagation and the Gumbel-Softmax technique. This pruning method is applied at several application points in Conformers such that the effective number of parameters can be significantly reduced. Specifically, adaptive dropout layers are introduced in three locations in each Conformer block: (a) the hidden layer of the feed-forward-net component, (b) the query vectors and the value vectors of the self-attention component, and (c) the input vectors of the LConv component. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLogic, programming, and type systems · Teaching and Learning Programming · Distributed and Parallel Computing Systems
MethodsDropout · Pruning · Adaptive Dropout
