BitCal-TTS: Bit-Calibrated Test-Time Scaling for Quantized Reasoning Models

Sai Babu Patarlapalli; Surya Teja Avvaru

arXiv:2605.05561·cs.AI·May 8, 2026

BitCal-TTS: Bit-Calibrated Test-Time Scaling for Quantized Reasoning Models

Sai Babu Patarlapalli, Surya Teja Avvaru

PDF

TL;DR

BitCal-TTS is a runtime controller for quantized reasoning models that improves accuracy and reduces premature stopping without fine-tuning, using online uncertainty proxies and bit-aware confidence rescaling.

Contribution

It introduces a lightweight, no-fine-tuning method for adaptive test-time scaling in 4-bit quantized models, enhancing reasoning accuracy and efficiency.

Findings

01

Improves exact-match accuracy on GSM8K evaluation shards at 7B and 14B scales.

02

Reduces premature stopping rate from 14.8% to 11.1% on 7B models.

03

Maintains substantial token savings compared to fixed-budget decoding.

Abstract

Post-training quantization makes large reasoning models practical under tight memory and latency budgets, but it can distort the online signals that drive adaptive test-time compute allocation. Under a fixed cap on the number of newly generated tokens, miscalibrated confidence can lead to harmful early halting: the model may surface a plausible final line while the underlying reasoning is still wrong, or the controller may stop before the trace has stabilized. We study this interaction for greedy 4-bit inference and propose BitCal-TTS, a lightweight runtime controller that combines (i) inexpensive online proxies for token-level uncertainty and reasoning-trace stability, (ii) a bit-conditioned confidence rescaling that is conservative at low nominal precision, and (iii) a bit-aware post-marker confirmation horizon designed for GSM8K-style structured outputs. The method requires no…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.