Evaluating the Environmental Impact of using SLMs and Prompt Engineering for Code Generation
Md Afif Al Mamun, Sayan Nath, Gias Uddin, Novarun Deb

TL;DR
This study empirically examines how different prompt engineering strategies in open-source Small Language Models affect code generation accuracy and environmental sustainability, revealing opportunities for greener AI practices.
Contribution
It is the first systematic empirical analysis of prompt strategies' impact on both accuracy and sustainability in SLM-based code generation.
Findings
Chain-of-Thought prompts balance reasoning and energy efficiency.
Multi-sampling strategies often lead to higher energy costs with minimal gains.
Regional grid carbon intensity significantly influences deployment emissions.
Abstract
The shift from cloud-hosted Large Language Models (LLMs) to locally deployed open-source Small Language Models (SLMs) has democratized AI-assisted coding; however, it has also decentralized the environmental footprint of AI. While prompting strategies - such as Chain-of-Thought and ReAct - serve as external mechanisms for optimizing code generation without modifying model parameters, their impact on energy consumption and carbon emissions remains largely invisible to developers. This paper presents the first systematic empirical study investigating how different prompt engineering strategies in SLM-based code generation impact code generation accuracy alongside sustainability factors. We evaluate six prominent prompting strategies across 11 open-source models (ranging from 1B to 34B parameters) using the HumanEval+ and MBPP+ benchmarks. By measuring Pass@1 accuracy alongside energy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
