Layer Pruning on Demand with Intermediate CTC
Jaesong Lee, Jingu Kang, Shinji Watanabe

TL;DR
This paper introduces a method for dynamically pruning layers in end-to-end speech recognition models based on CTC, enabling on-demand model depth reduction without retraining, suitable for resource-constrained devices.
Contribution
It proposes a novel training and pruning approach using intermediate CTC and stochastic depth, allowing flexible runtime model adaptation without accuracy loss.
Findings
Pruned models maintain accuracy comparable to fully trained models of the same depth.
Real-time factor improved from 0.005 to 0.002 on GPU.
Layer pruning can be performed on demand without additional fine-tuning.
Abstract
Deploying an end-to-end automatic speech recognition (ASR) model on mobile/embedded devices is a challenging task, since the device computational power and energy consumption requirements are dynamically changed in practice. To overcome the issue, we present a training and pruning method for ASR based on the connectionist temporal classification (CTC) which allows reduction of model depth at run-time without any extra fine-tuning. To achieve the goal, we adopt two regularization methods, intermediate CTC and stochastic depth, to train a model whose performance does not degrade much after pruning. We present an in-depth analysis of layer behaviors using singular vector canonical correlation analysis (SVCCA), and efficient strategies for finding layers which are safe to prune. Using the proposed method, we show that a Transformer-CTC model can be pruned in various depth on demand,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
MethodsPruning
