An Exploratory Literature Study on Sharing and Energy Use of Language Models for Source Code
Max Hort, Anastasiia Grishina, Leon Moonen

TL;DR
This study investigates the sharing practices and energy transparency of language models trained on source code in software engineering, revealing limited artifact sharing and emphasizing the need for better reproducibility and sustainability practices.
Contribution
It provides the first comprehensive analysis of source code model sharing and energy transparency, highlighting current deficiencies and proposing recommendations for sustainable research practices.
Findings
Only 27% of studies share artifacts for reuse.
40% of papers do not share source code or trained models.
Many studies lack detailed information on training energy and hardware.
Abstract
Large language models trained on source code can support a variety of software development tasks, such as code recommendation and program repair. Large amounts of data for training such models benefit the models' performance. However, the size of the data and models results in long training times and high energy consumption. While publishing source code allows for replicability, users need to repeat the expensive training process if models are not shared. The main goal of the study is to investigate if publications that trained language models for software engineering (SE) tasks share source code and trained artifacts. The second goal is to analyze the transparency on training energy usage. We perform a snowballing-based literature search to find publications on language models for source code, and analyze their reusability from a sustainability standpoint. From 494 unique…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGreen IT and Sustainability · Software Engineering Research
