SimLens for Early Exit in Large Language Models: Eliciting Accurate Latent Predictions with One More Token
Ming Ma, Bowen Zheng, Zhongqiao Lin, Tianming Yang

TL;DR
SimLens introduces a training-free decoding method that significantly improves early-layer predictions in large language models, enabling more accurate and efficient early exit strategies.
Contribution
The paper presents SimLens, a novel lightweight decoder for early exit in LLMs that enhances prediction accuracy without additional training, and combines it with confidence estimation in a hybrid mechanism.
Findings
SimLens improves accuracy across multiple datasets and models.
SimExit achieves up to 1.40× speedup with minimal accuracy loss.
Ablation studies reveal distinct roles for start and answer tokens.
Abstract
Intermediate-layer predictions in large language models (LLMs) are informative but hard to decode accurately, especially at early layers. Existing lens-style methods typically rely on direct linear readout, which is simple but often drifts away from the model's eventual prediction. We proposeSimLens, a simple training-free decoder for single-token decision tasks that keeps only the start token and a candidate answer token ([s] and [a]) and performs one lightweight continuation through the remaining upper layers. This surprisingly small modification recovers much more accurate latent predictions than direct linear decoding. We further introduce Linear SimLens, a lightweight linear approximation for entropy-based confidence estimation, and combine the two in SimExit, a hybrid early-exit mechanism. On ARC, BoolQ, and HeadQA with LLaMA-7B and Vicuna-7B, SimLens improves Iso-Compute accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
