Smart but Costly? Benchmarking LLMs on Functional Accuracy and Energy Efficiency

Mohammadjavad Mehditabar; Saurabhsingh Rajput; Antonio Mastropaolo; Tushar Sharma

arXiv:2511.07698·cs.SE·November 12, 2025

Smart but Costly? Benchmarking LLMs on Functional Accuracy and Energy Efficiency

Mohammadjavad Mehditabar, Saurabhsingh Rajput, Antonio Mastropaolo, Tushar Sharma

PDF

Open Access

TL;DR

This paper introduces BRACE, a framework for benchmarking large language models on code tasks, evaluating their energy efficiency and accuracy to guide sustainable and effective model selection.

Contribution

The paper presents a novel benchmarking framework, BRACE, with two rating methods for assessing energy efficiency and accuracy trade-offs in code language models.

Findings

01

Models perform better in code summarization tasks.

02

Model size does not significantly impact ratings.

03

BRACE enables evidence-based model selection balancing sustainability and performance.

Abstract

The rapid advancement of AI technologies and their accelerated adoption in software development necessitates a systematic evaluation of their environmental impact alongside functional correctness. While prior studies have examined sustainability in large language models, existing approaches lack systematic frameworks for evaluating accuracy-energy trade-offs in Code Language Models (CLMs). In this paper, we present a framework, BRACE, to benchmark CLMs on a unified scale of energy efficiency and functional correctness (referred to as accuracy). We benchmark 22 state-of-the-art models on code generation and summarization tasks, proposing two rating methods: Concentric Incremental Rating Circles (CIRC) and Observation to Expectation Rating (OTER). CIRC provides deterministic Euclidean-based rankings with static trade-offs that are robust to outliers, and OTER offers trend-aware evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGreen IT and Sustainability · Software Engineering Research · Software System Performance and Reliability