Piloting Copilot, Codex, and StarCoder2: Hot Temperature, Cold Prompts, or Black Magic?

Jean-Baptiste D\"oderlein; Nguessan Hermann Kouadio; Mathieu Acher; Djamel Eddine Khelladi; Benoit Combemale

arXiv:2210.14699·cs.SE·September 9, 2025·6 cites

Piloting Copilot, Codex, and StarCoder2: Hot Temperature, Cold Prompts, or Black Magic?

Jean-Baptiste D\"oderlein, Nguessan Hermann Kouadio, Mathieu Acher, Djamel Eddine Khelladi, Benoit Combemale

PDF

Open Access

TL;DR

This study investigates how input variations like prompts and parameters influence the performance of LLM-based code assistants, revealing significant performance improvements and complex interactions that affect their practical deployment.

Contribution

It systematically analyzes the impact of input modifications on code assistant effectiveness across multiple models and benchmarks, highlighting their potential and limitations.

Findings

01

Input variations can boost success rates up to 79.27%.

02

Optimal settings vary by problem and model.

03

Removing prompts can sometimes improve performance.

Abstract

Language models are promising solutions for tackling increasing complex problems. In software engineering, they recently gained attention in code assistants, which generate programs from a natural language task description (prompt). They have the potential to save time and effort but remain poorly understood, limiting their optimal use. In this article, we investigate the impact of input variations on two configurations of a language model, focusing on parameters such as task description, surrounding context, model creativity, and the number of generated solutions. We design specific operators to modify these inputs and apply them to three LLM-based code assistants (Copilot, Codex, StarCoder2) and two benchmarks representing algorithmic problems (HumanEval, LeetCode). Our study examines whether these variations significantly affect program quality and how these effects generalize across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Reinforcement Learning in Robotics · Software Engineering Techniques and Practices