Language Models in Software Development Tasks: An Experimental Analysis   of Energy and Accuracy

Negar Alizadeh; Boris Belchev; Nishant Saurabh; Patricia Kelbert,; Fernando Castor

arXiv:2412.00329·cs.SE·January 20, 2025

Language Models in Software Development Tasks: An Experimental Analysis of Energy and Accuracy

Negar Alizadeh, Boris Belchev, Nishant Saurabh, Patricia Kelbert,, Fernando Castor

PDF

Open Access

TL;DR

This paper evaluates the trade-offs between energy consumption and accuracy in 18 families of language models used for software development tasks, highlighting the efficiency of quantized models and the lack of a one-size-fits-all solution.

Contribution

It provides an empirical analysis of the performance and energy efficiency of various locally-deployed language models across different hardware setups for software development.

Findings

01

Larger models with higher energy budgets do not always yield better accuracy.

02

Quantized large models often outperform full-precision medium models in efficiency and accuracy.

03

No single model suits all software development tasks.

Abstract

The use of generative AI-based coding assistants like ChatGPT and Github Copilot is a reality in contemporary software development. Many of these tools are provided as remote APIs. Using third-party APIs raises data privacy and security concerns for client companies, which motivates the use of locally-deployed language models. In this study, we explore the trade-off between model accuracy and energy consumption, aiming to provide valuable insights to help developers make informed decisions when selecting a language model. We investigate the performance of 18 families of LLMs in typical software development tasks on two real-world infrastructures, a commodity GPU and a powerful AI-specific GPU. Given that deploying LLMs locally requires powerful infrastructure which might not be affordable for everyone, we consider both full-precision and quantized models. Our findings reveal that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Techniques and Practices · Software Engineering Research · Software System Performance and Reliability