Greener yet Powerful: Taming Large Code Generation Models with Quantization
Xiaokai Wei, Sujan Gonugondla, Wasi Ahmad, Shiqi Wang, Baishakhi Ray,, Haifeng Qian, Xiaopeng Li, Varun Kumar, Zijian Wang, Yuchen Tian, Qing Sun,, Ben Athiwaratkun, Mingyue Shang, Murali Krishna Ramanathan, Parminder Bhatia,, Bing Xiang

TL;DR
This paper explores quantization techniques to compress large code generation models, significantly reducing resource usage and carbon footprint while maintaining accuracy and robustness, enabling deployment on standard laptops.
Contribution
The study identifies an effective quantization recipe that allows large models to run efficiently on limited hardware without substantial performance loss.
Findings
Quantization reduces model size and latency significantly.
The proposed method maintains accuracy and robustness in code generation.
Applicable to both code generation and summarization tasks.
Abstract
ML-powered code generation aims to assist developers to write code in a more productive manner, by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have substantially pushed the boundary of code generation and achieved impressive performance. Despite their great power, the huge number of model parameters poses a significant threat to adapting them in a regular software development environment, where a developer might use a standard laptop or mid-size server to develop her code. Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint. Model compression is a promising approach to address these challenges. Several techniques are proposed to compress large pretrained models typically used for vision or textual data. Out of many available compression…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Software Engineering Research · Parallel Computing and Optimization Techniques
