Revolutionizing Mixed Precision Quantization: Towards Training-free Automatic Proxy Discovery via Large Language Models
Haidong Kang, Jun Du, Lihong Lin

TL;DR
This paper introduces TAP, a training-free, LLM-driven framework that automatically discovers proxies for mixed-precision quantization, eliminating the need for manual design or costly optimization, and achieves state-of-the-art results.
Contribution
Proposes a novel LLM-based, training-free proxy discovery framework for MPQ that automates proxy design using evolutionary search and preference optimization.
Findings
Achieves state-of-the-art performance on benchmarks.
Eliminates manual proxy design and training for MPQ.
Demonstrates effectiveness of LLM-driven proxy discovery.
Abstract
Mixed-Precision Quantization (MPQ) liberates Deep Neural Networks (DNNs) from the Out-Of-Memory (OOM) bottleneck and has garnered increasing research attention. However, conventional methods either rely on costly differentiable optimization search, which is neither efficient nor flexible, or learn a quantized DNN from a proxy (e.g., HAWQ) manually designed by human experts, which is labor-intensive and requires extensive expert knowledge. Can we design a proxy without involving any human experts or training? In this paper, we provide an affirmative answer by proposing a novel Large Language Model (LLM)-driven Training-free Automatic Proxy (dubbed TAP) discovery framework. It reforms the design paradigm of MPQ by utilizing LLMs and evolutionary search strategies to automatically find superior TAP tailored for MPQ. In addition, to bridge the gap between black-box LLMs and the challenging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
