BitSkip: An Empirical Analysis of Quantization and Early Exit Composition in Transformers

Ramshankar Bhuvaneswaran; Handan Liu

arXiv:2510.23766·cs.CL·March 23, 2026

BitSkip: An Empirical Analysis of Quantization and Early Exit Composition in Transformers

Ramshankar Bhuvaneswaran, Handan Liu

PDF

TL;DR

BitSkip systematically explores quantization and early exit strategies in transformers, revealing that simple 8-bit quantization without complex transforms can outperform more intricate methods and even rival full-precision models in language modeling tasks.

Contribution

Introduces BitSkip, a hybrid framework for analyzing the interactions of quantization and early exit techniques in transformers, highlighting the surprising effectiveness of simple 8-bit quantization.

Findings

01

8-bit quantized model without Hadamard outperforms 4-bit and Hadamard-enhanced models.

02

Hadamard transforms at 8-bit cause severe training instability, degrading performance by over 37,000%.

03

Layer 18 offers a 32.5% speed gain with only 4% quality loss.

Abstract

The pursuit of efficient Large Language Models (LLMs) has led to increasingly complex techniques like extreme quantization and dynamic routing. While individual benefits of these methods are well-documented, their compositional effects remain poorly understood. This paper introduces BitSkip, a hybrid architectural framework for systematically exploring these interactions. Counter-intuitively, our findings reveal that a simple 8-bit quantized model without Hadamard transform (BitSkip-V1) not only outperforms its more complex 4-bit and Hadamard-enhanced counterparts but also competes the full-precision baseline in quality (perplexity of 1.13 vs 1.19) . The introduction of Hadamard transforms, even at 8-bit precision, catastrophically degraded performance by over 37,000%, tracing fundamental training instability. Our BitSkip-V1 recipe demonstrates superior early-exit characteristics, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.