PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling
Yuxuan Yue, Zukang Xu, Zhihang Yuan, Dawei Yang, Jianlong Wu, Liqiang Nie

TL;DR
This paper introduces PCDVQ, a novel vector quantization method that decouples direction and magnitude in polar coordinates to improve the accuracy of large language models at low-bit quantization levels.
Contribution
The paper proposes PCDVQ, a new VQ framework with polar coordinate decoupling and distribution-aligned codebooks, enhancing quantization accuracy for LLMs.
Findings
Outperforms baseline methods at 2-bit quantization by at least 1.5% zero-shot accuracy.
Decoupling direction and magnitude reduces quantization errors significantly.
Polar coordinate transformation improves the sensitivity of quantization to important vector features.
Abstract
Large Language Models (LLMs) face significant challenges in edge deployment due to their massive parameter scale. Vector Quantization (VQ), a clustering-based quantization method, serves as a prevalent solution to this issue for its extremely low-bit (even at 2-bit) and considerable accuracy. Since a vector is a quantity in mathematics and physics that has both direction and magnitude, existing VQ works typically quantize them in a coupled manner. However, we find that direction exhibits significantly greater sensitivity to quantization compared to the magnitude. For instance, when separately clustering the directions and magnitudes of weight vectors in LLaMA-2-7B, the accuracy drop of zero-shot tasks are 46.5\% and 2.3\%, respectively. This gap even increases with the reduction of clustering centers. Further, Euclidean distance, a common metric to access vector similarities in current…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- The paper provides an insightful observation about vector-based quantization: the difference in the approximation behavior of direction and magnitude. - The idea of decoupling magnitude and direction components and handling their different distributions is creative and conceptually elegant. - The proposed method achieves strong performance compared to established baselines.
- The experiments primarily focus on a range of LLaMA models, with only a single Mistral experiment included. Broader evaluation across different architectures would strengthen the paper. - The comparison with scalar-based quantization methods is limited and could be expanded for a fairer assessment. - While the idea is simple and well-motivated, its conceptual simplicity raises questions about whether it is substantial enough for a full-length scientific paper.
1.The paper shows direction is markedly more sensitive to quantization than magnitude, and analyzes why Euclidean MSE emphasizes magnitude errors more strongly, supporting the decoupling design. The motivation of the work is clear and reasonable. 2.This work provides a clear and comprehensive theoretical foundation for polar coordinate decoupling, demonstrating strong depth and theoretical rigor. 3.Across multiple LLM families and standard zero-shot benchmarks, the main results tables show tha
1.The choice to allocate more bits to direction is well supported by experiments, but the paper offers no formal analysis to guide the split or to select an optimal allocation under different conditions. 2. The method adopts a fixed vector dimension and borrows several settings from prior work, but it remains unclear how to adapt the dimension or the direction–magnitude bit split across model sizes, layer types, or differing weight statistics. Robustness to these design choices is not systemati
1. Introducing polar decoupled vector quantization is an interesting and novel attempt. 2. The overall writing is clear and easy to follow. 3. The method demonstrates superior performance on several large language models, including LLaMA-2/3 and Mistral, achieving better zero-shot accuracy and perplexity at the 2-bit weight quantization level compared with existing state-of-the-art quantization approaches, which validates its effectiveness.
1. The PCDVQ framework introduces additional computational steps, including polar coordinate conversion, two independent codebook searches using cosine similarity and Euclidean distance respectively, and possible inverse conversion. The paper reports improved throughput mainly due to reduced memory bandwidth, but it does not quantify the impact of these added operations on single inference latency. This is important because on many edge devices compute cost is more critical than memory bandwidth
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Data Compression Techniques · Stochastic Gradient Optimization Techniques
