Two-dimensional early exit optimisation of LLM inference

Jan H\r{u}la; David Adamczyk; Tom\'a\v{s} Filip; Martin Pavl\'i\v{c}ek; Petr Sos\'ik

arXiv:2604.18592·cs.CL·April 22, 2026

Two-dimensional early exit optimisation of LLM inference

Jan H\r{u}la, David Adamczyk, Tom\'a\v{s} Filip, Martin Pavl\'i\v{c}ek, Petr Sos\'ik

PDF

TL;DR

This paper proposes a novel two-dimensional early exit method for large language model inference, combining layer-wise and sentence-wise strategies to significantly reduce computation while maintaining accuracy.

Contribution

The paper introduces a 2D early exit approach that coordinates layer and sentence exits, achieving greater efficiency than existing methods across multiple LLMs and tasks.

Findings

01

Achieves 1.4--2.3× speed-up over layer-wise early exit.

02

Effective on multiple state-of-the-art LLMs and sentiment datasets.

03

Graceful degradation on complex multi-class problems.

Abstract

We introduce a two-dimensional (2D) early exit strategy that coordinates layer-wise and sentence-wise exiting for classification tasks in large language models. By processing input incrementally sentence-by-sentence while progressively activating deeper layers, our method achieves multiplicative computational savings that exceed those from optimizing either dimension independently. Experimental evaluation across four state-of-the-art LLMs (Llama 3.1, Llama 3.2, Gemma, Qwen; 3B-8B parameters) on three sentiment classification datasets demonstrates additional speed-ups of 1.4--2.3 $\times$ over optimal layer-wise early exit for simpler tasks with vanilla models, with graceful degradation on complex multi-class problems. Fine-tuning reduces but does not eliminate this advantage. The approach is model-agnostic, requires only lightweight classification adapters, and is orthogonal to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.