Greening Large Language Models of Code
Jieke Shi, Zhou Yang, Hong Jin Kang, Bowen Xu, Junda He, David Lo

TL;DR
This paper introduces Avatar, a method to create compact, energy-efficient, and deployable models of code from large language models, achieving significant reductions in size, energy use, and latency with minimal performance loss.
Contribution
Avatar formulates model optimization as a multi-objective problem solved with SMT and tailored algorithms, enabling deployment on resource-constrained devices.
Findings
Models reduced to 3 MB, 160× smaller than original.
Energy consumption decreased up to 184×.
Inference latency improved up to 76×.
Abstract
Large language models of code have shown remarkable effectiveness across various software engineering tasks. Despite the availability of many cloud services built upon these powerful models, there remain several scenarios where developers cannot take full advantage of them, stemming from factors such as restricted or unreliable internet access, institutional privacy policies that prohibit external transmission of code to third-party vendors, and more. Therefore, developing a compact, efficient, and yet energy-saving model for deployment on developers' devices becomes essential. To this aim, we propose Avatar, a novel approach that crafts a deployable model from a large language model of code by optimizing it in terms of model size, inference latency, energy consumption, and carbon footprint while maintaining a comparable level of effectiveness. The key idea of Avatar is to formulate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Engineering Techniques and Practices
