The Quantization Trap: Breaking Linear Scaling Laws in Multi-Hop Reasoning

Henry Han; Xiyang Liu; Xiaodong Wang; Fei Han; Xiaodong Li

arXiv:2602.13595·cs.AI·May 4, 2026

The Quantization Trap: Breaking Linear Scaling Laws in Multi-Hop Reasoning

Henry Han, Xiyang Liu, Xiaodong Wang, Fei Han, Xiaodong Li

PDF

TL;DR

This paper shows that reducing numerical precision in multi-hop reasoning can paradoxically increase energy consumption and decrease accuracy due to hardware and latency bottlenecks, breaking traditional scaling laws.

Contribution

It reveals the 'quantization trap' in multi-hop reasoning, providing a theoretical decomposition and a predictive model for when scaling laws fail.

Findings

01

Reducing precision from 16-bit to 8/4-bit increases energy use and degrades accuracy.

02

Hardware casting overhead and dequantization latency are key bottlenecks.

03

A Critical Model Scale predicts when the quantization trap occurs across hardware and model sizes.

Abstract

Neural scaling laws provide a predictable recipe for AI advancement: reducing numerical precision should linearly improve computational efficiency and energy profile ( $E \propto bits$ ). In this paper, we demonstrate that this scaling law breaks in the context of multi-hop reasoning. We reveal a 'quantization trap' where reducing precision from 16-bit to 8/4-bit paradoxically increases net energy consumption while degrading reasoning accuracy. We provide a rigorous theoretical decomposition that attributes this failure to hardware casting overhead, the hidden latency cost of dequantization kernels, which becomes a dominant bottleneck in sequential reasoning chains, as well as to a sequential energy amortization failure. As a result, scaling law breaking is unavoidable in practice. We formalize a Critical Model Scale $N^{*}$ that predicts when the trap dissolves or deepens as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.