Empirical and Sustainability Aspects of Software Engineering Research in the Era of Large Language Models: A Reflection

David Williams; Max Hort; Maria Kechagia; Aldeida Aleti; Justyna Petke; Federica Sarro

arXiv:2510.26538·cs.SE·January 21, 2026

Empirical and Sustainability Aspects of Software Engineering Research in the Era of Large Language Models: A Reflection

David Williams, Max Hort, Maria Kechagia, Aldeida Aleti, Justyna Petke, Federica Sarro

PDF

TL;DR

This paper reviews the current state of software engineering research involving large language models, focusing on challenges like benchmarking, replicability, contamination, and sustainability, and offers recommendations for improvement.

Contribution

It provides a structured overview of LLM-based SE research, highlighting practices, shortcomings, and proposing strategies to enhance rigour and sustainability.

Findings

01

Highlights encouraging practices in LLM-based SE research

02

Identifies persistent shortcomings in benchmarking and replicability

03

Recommends strategies to improve sustainability and rigour

Abstract

Software Engineering (SE) research involving the use of Large Language Models (LLMs) has introduced several new challenges related to rigour in benchmarking, contamination, replicability, and sustainability. In this paper, we invite the research community to reflect on how these challenges are addressed in SE. Our results provide a structured overview of current LLM-based SE research at ICSE, highlighting both encouraging practices and persistent shortcomings. We conclude with recommendations to strengthen benchmarking rigour, improve replicability, and address the financial and environmental costs of LLM-based SE.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.