SimLens for Early Exit in Large Language Models: Eliciting Accurate Latent Predictions with One More Token

Ming Ma; Bowen Zheng; Zhongqiao Lin; Tianming Yang

arXiv:2507.17618·cs.CL·March 17, 2026

SimLens for Early Exit in Large Language Models: Eliciting Accurate Latent Predictions with One More Token

Ming Ma, Bowen Zheng, Zhongqiao Lin, Tianming Yang

PDF

Open Access

TL;DR

SimLens introduces a training-free decoding method that significantly improves early-layer predictions in large language models, enabling more accurate and efficient early exit strategies.

Contribution

The paper presents SimLens, a novel lightweight decoder for early exit in LLMs that enhances prediction accuracy without additional training, and combines it with confidence estimation in a hybrid mechanism.

Findings

01

SimLens improves accuracy across multiple datasets and models.

02

SimExit achieves up to 1.40× speedup with minimal accuracy loss.

03

Ablation studies reveal distinct roles for start and answer tokens.

Abstract

Intermediate-layer predictions in large language models (LLMs) are informative but hard to decode accurately, especially at early layers. Existing lens-style methods typically rely on direct linear readout, which is simple but often drifts away from the model's eventual prediction. We proposeSimLens, a simple training-free decoder for single-token decision tasks that keeps only the start token and a candidate answer token ([s] and [a]) and performs one lightweight continuation through the remaining upper layers. This surprisingly small modification recovers much more accurate latent predictions than direct linear decoding. We further introduce Linear SimLens, a lightweight linear approximation for entropy-based confidence estimation, and combine the two in SimExit, a hybrid early-exit mechanism. On ARC, BoolQ, and HeadQA with LLaMA-7B and Vicuna-7B, SimLens improves Iso-Compute accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling