Exploring Model Invariance with Discrete Search for Ultra-Low-Bit Quantization
Yuqiao Wen, Yanshuai Cao, Lili Mou

TL;DR
This paper introduces InvarExplore, a framework for ultra-low-bit quantization of large language models that leverages multiple invariances, including permutation invariance, through a discrete search algorithm, improving performance over existing methods.
Contribution
It presents a novel unified framework that systematically explores multiple model invariances, especially permutation invariance, using a discrete search for ultra-low-bit quantization.
Findings
Achieves performance improvements over state-of-the-art methods.
Compatible with existing quantization techniques.
Effectively explores permutation invariance with discrete search.
Abstract
Large language models have been increasing in size due to their success in a wide range of applications. This calls for a pressing need to reduce memory usage to make them more accessible. Post-training quantization is a popular technique which uses fewer bits (e.g., 4--8 bits) to represent the model without retraining it. However, it remains a challenging task to perform quantization in an ultra-low-bit setup (e.g., 2 bits). In this paper, we propose InvarExplore, a unified framework that systematically explores different model invariance at the same time, allowing us to take advantage of the synergy between each type of invariance. Importantly, InvarExplore features a discrete search algorithm that enables us to explore permutation invariance, which is under-studied as it cannot be optimized with gradient-based methods. Results show that InvarExplore is compatible with existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhotonic and Optical Devices
