Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices

Maxence Lasbordes; Daniele Falavigna; Alessio Brutti

arXiv:2506.18035·cs.CL·June 24, 2025

Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices

Maxence Lasbordes, Daniele Falavigna, Alessio Brutti

PDF

1 Repo 3 Models

TL;DR

Splitformer introduces parallel processing layers with downsampled inputs to enhance early-exit speech recognition models, achieving better accuracy on benchmarks with minimal parameter increase and no impact on inference time.

Contribution

The paper proposes a novel architecture that combines early-exit strategies with parallel downsampling layers to improve speech recognition performance.

Findings

01

Significant performance improvement on standard benchmarks.

02

Minimal increase in model parameters.

03

No change in inference time.

Abstract

The ability to dynamically adjust the computational load of neural models during inference in a resource aware manner is crucial for on-device processing scenarios, characterised by limited and time-varying computational resources. Early-exit architectures represent an elegant and effective solution, since they can process the input with a subset of their layers, exiting at intermediate branches (the upmost layers are hence removed from the model). From a different perspective, for automatic speech recognition applications there are memory-efficient neural architectures that apply variable frame rate analysis, through downsampling/upsampling operations in the middle layers, reducing the overall number of operations and improving significantly the performance on well established benchmarks. One example is the Zipformer. However, these architectures lack the modularity necessary to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

augustgw/early-exit-transformer
torchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttentive Walk-Aggregating Graph Neural Network · Parallel Layers