An Empirical Study of Sustainability in Prompt-driven Test Script Generation Using Small Language Models
Pragati Kumari, Novarun Deb

TL;DR
This paper empirically investigates the environmental impact and performance tradeoffs of small language models in prompt-driven test script generation, highlighting how model choice and prompt design influence sustainability.
Contribution
It provides the first detailed empirical analysis of energy consumption, carbon emissions, and performance of small language models during automated test script generation.
Findings
Different SLMs have distinct sustainability profiles.
Prompt structure and model choice jointly affect environmental and performance outcomes.
Some models favor lower energy use and faster execution.
Abstract
The increasing use of language models in automated test script generation raises concerns about their environmental impact, yet existing sustainability analyses focus predominantly on large language models. As a result, the energy and carbon characteristics of small language models (SLMs) during prompt-driven unit-test script generation remain largely unexplored. To address this gap, this study empirically examines the environmental and performance tradeoffs of SLMs (in the 2B-8B parameter range) using the HumanEval benchmark and adaptive prompt variants (based on the Anthropic template). The analysis uses CodeCarbon to characterize energy consumption carbon emissions and duration under controlled conditions, with unit-test script coverage serving as an initial proxy for generated test quality. Our results show that different SLMs exhibit distinct sustainability profiles - some favor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
