ScaleBITS: Scalable Bitwidth Search for Hardware-Aligned Mixed-Precision LLMs

Xinlin Li; Timothy Chou; Josh Fromm; Zichang Liu; Yunjie Pan; Christina Fragouli

arXiv:2602.17698·cs.LG·February 23, 2026

ScaleBITS: Scalable Bitwidth Search for Hardware-Aligned Mixed-Precision LLMs

Xinlin Li, Timothy Chou, Josh Fromm, Zichang Liu, Yunjie Pan, Christina Fragouli

PDF

Open Access

TL;DR

ScaleBITS introduces a scalable, hardware-efficient mixed-precision quantization method for large language models, optimizing bitwidth allocation to significantly reduce memory and inference costs while maintaining performance.

Contribution

It presents a novel sensitivity analysis, hardware-aligned weight partitioning, and a scalable optimization algorithm for automated mixed-precision quantization.

Findings

01

Up to 36% improvement over uniform quantization

02

Outperforms state-of-the-art sensitivity-aware methods by up to 13%

03

Achieves ultra-low-bit quantization without runtime overhead

Abstract

Post-training weight quantization is crucial for reducing the memory and inference cost of large language models (LLMs), yet pushing the average precision below 4 bits remains challenging due to highly non-uniform weight sensitivity and the lack of principled precision allocation. Existing solutions use irregular fine-grained mixed-precision with high runtime overhead or rely on heuristics or highly constrained precision allocation strategies. In this work, we propose ScaleBITS, a mixed-precision quantization framework that enables automated, fine-grained bitwidth allocation under a memory budget while preserving hardware efficiency. Guided by a new sensitivity analysis, we introduce a hardware-aligned, block-wise weight partitioning scheme, powered by bi-directional channel reordering. We formulate global bitwidth allocation as a constrained optimization problem and develop a scalable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Advanced Neural Network Applications · Speech Recognition and Synthesis