Enhancing Post-Training Quantization via Future Activation Awareness

Zheqi Lv; Zhenxuan Fan; Qi Tian; Wenqiao Zhang; Yueting Zhuang

arXiv:2602.02538·cs.LG·February 4, 2026

Enhancing Post-Training Quantization via Future Activation Awareness

Zheqi Lv, Zhenxuan Fan, Qi Tian, Wenqiao Zhang, Yueting Zhuang

PDF

Open Access

TL;DR

This paper introduces Future-Aware Quantization (FAQ), a novel post-training quantization method that uses future-layer activations to improve model compression accuracy and stability without additional training or significant computational overhead.

Contribution

The paper proposes a new quantization approach leveraging future activations and a window-wise preview mechanism to enhance accuracy and robustness in LLM compression.

Findings

01

FAQ outperforms prior PTQ methods in experiments.

02

It requires no backward passes or data reconstruction.

03

It is suitable for edge deployment due to low overhead.

Abstract

Post-training quantization (PTQ) is a widely used method to compress large language models (LLMs) without fine-tuning. It typically sets quantization hyperparameters (e.g., scaling factors) based on current-layer activations. Although this method is efficient, it suffers from quantization bias and error accumulation, resulting in suboptimal and unstable quantization, especially when the calibration data is biased. To overcome these issues, we propose Future-Aware Quantization (FAQ), which leverages future-layer activations to guide quantization. This allows better identification and preservation of important weights, while reducing sensitivity to calibration noise. We further introduce a window-wise preview mechanism to softly aggregate multiple future-layer activations, mitigating over-reliance on any single layer. To avoid expensive greedy search, we use a pre-searched configuration…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning