Optimizing Large Language Model Hyperparameters for Code Generation
Chetan Arora, Ahnaf Ibn Sayeed, Sherlock Licorish, Fanyu Wang,, Christoph Treude

TL;DR
This paper systematically explores how hyperparameters like temperature, top_p, frequency penalty, and presence penalty affect the quality of code generated by large language models across multiple Python tasks, providing insights for optimal settings.
Contribution
It presents an exhaustive analysis of hyperparameter impacts on LLM code generation performance, offering practical guidelines for tuning models effectively.
Findings
Optimal performance with temperature below 0.5
Best results with top_p below 0.75
Frequency penalty between -1 and 1.5
Abstract
Large Language Models (LLMs), such as GPT models, are increasingly used in software engineering for various tasks, such as code generation, requirements management, and debugging. While automating these tasks has garnered significant attention, a systematic study on the impact of varying hyperparameters on code generation outcomes remains unexplored. This study aims to assess LLMs' code generation performance by exhaustively exploring the impact of various hyperparameters. Hyperparameters for LLMs are adjustable settings that affect the model's behaviour and performance. Specifically, we investigated how changes to the hyperparameters: temperature, top probability (top_p), frequency penalty, and presence penalty affect code generation outcomes. We systematically adjusted all hyperparameters together, exploring every possible combination by making small increments to each hyperparameter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
