Training dynamic models using early exits for automatic speech   recognition on resource-constrained devices

George August Wright; Umberto Cappellazzo; Salah Zaiem; Desh Raj,; Lucas Ondel Yang; Daniele Falavigna; Mohamed Nabih Ali; Alessio Brutti

arXiv:2309.09546·eess.AS·February 23, 2024·1 cites

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj,, Lucas Ondel Yang, Daniele Falavigna, Mohamed Nabih Ali, Alessio Brutti

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper investigates training strategies for early-exit architectures in self-attention based automatic speech recognition models, demonstrating that training from scratch improves performance and exploring posterior probability-based exit selection.

Contribution

It compares fine-tuning pre-trained models versus training from scratch for early-exit ASR models, showing scratch training enhances accuracy and performance.

Findings

01

Early-exit models trained from scratch outperform fine-tuned models.

02

Scratch-trained models maintain performance with fewer encoder layers.

03

Posterior probability-based exit selection is effective.

Abstract

The ability to dynamically adjust the computational load of neural models during inference is crucial for on-device processing scenarios characterised by limited and time-varying computational resources. A promising solution is presented by early-exit architectures, in which additional exit branches are appended to intermediate layers of the encoder. In self-attention models for automatic speech recognition (ASR), early-exit architectures enable the development of dynamic models capable of adapting their size and architecture to varying levels of computational resources and ASR performance demands. Previous research on early-exiting ASR models has relied on pre-trained self-supervised models, fine-tuned with an early-exit loss. In this paper, we undertake an experimental comparison between fine-tuning pre-trained backbones and training models from scratch with the early-exiting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

augustgw/early-exit-transformer
torchOfficial

Models

🤗
SpeechTek/Italian-EE-conformer
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems