PolarQuant: Leveraging Polar Transformation for Efficient Key Cache Quantization and Decoding Acceleration
Songhao Wu, Ang Lv, Xiao Feng, Yufei Zhang, Xun Zhang, Guojun Yin, Wei, Lin, Rui Yan

TL;DR
PolarQuant introduces a polar coordinate-based quantization method for key caches in large language models, effectively handling outliers and significantly accelerating decoding without sacrificing model performance.
Contribution
The paper presents PolarQuant, a novel 2D vector grouping and polar coordinate encoding approach for efficient key cache quantization in large language models.
Findings
Reduces memory usage of KV caches in large models.
Speeds up decoding by transforming inner products into table lookups.
Maintains model performance comparable to full-precision models.
Abstract
The KV cache in large language models is a dominant factor in memory usage, limiting their broader applicability. Quantizing the cache to lower bit widths is an effective way to reduce computational costs; however, previous methods struggle with quantizing key vectors due to outliers, resulting in excessive overhead. We propose a novel quantization approach called PolarQuant, which efficiently addresses the outlier challenge. We observe that outliers typically appear in only one of two dimensions, which are rotated together by a specific angle when rotary position embeddings are applied. When represented as two-dimensional vectors, these dimensions exhibit well-structured patterns, with radii and angles smoothly distributed in polar coordinates. This alleviates the challenge of outliers on per-channel quantization, making them well-suited for quantization. Thus, PolarQuant divides key…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Packet Processing and Optimization · Algorithms and Data Compression · Advanced Image and Video Retrieval Techniques
