QuIDE: Mastering the Quantized Intelligence Trade-off via Active Optimization

Xiantao Jiang

arXiv:2605.10959·cs.LG·May 13, 2026

QuIDE: Mastering the Quantized Intelligence Trade-off via Active Optimization

Xiantao Jiang

PDF

TL;DR

QuIDE introduces a unified metric for evaluating quantized neural networks, balancing compression, accuracy, and latency, and guides optimal quantization strategies across diverse tasks.

Contribution

It proposes the Intelligence Index as a comprehensive evaluation metric and offers a reproducible protocol for mixed-precision quantization search.

Findings

01

4-bit quantization is optimal for MNIST and large LLMs.

02

8-bit quantization is optimal for ResNet-18 on ImageNet.

03

4-bit PTQ collapses accuracy on complex CNN tasks.

Abstract

There is currently no unified metric for evaluating the efficiency of quantized neural networks. We propose QuIDE, built around the Intelligence Index I = (C x P)/log_2(T+1), which collapses the compression-accuracy-latency trade-off into a single score. Experiments across six settings -- SimpleCNN (MNIST, CIFAR), ResNet-18 (ImageNet-1K), and Llama-3-8B -- show a task-dependent Pareto Knee. 4-bit quantization is optimal for MNIST and large LLMs, while 8-bit is the sweet spot for complex CNN tasks (ResNet-18 on ImageNet), where 4-bit PTQ collapses accuracy catastrophically. The accuracy-gated variant I' correctly flags these non-viable configurations that the raw I would reward. QuIDE provides a reproducible evaluation protocol and a ready-to-use fitness function for mixed-precision search.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.