An Exploratory Study of Bayesian Prompt Optimization for Test-Driven Code Generation with Large Language Models

Shlok Tomar; Aryan Deshwal; Ethan Villalovoz; Mattia Fazzini; Haipeng Cai; Janardhan Rao Doppa

arXiv:2512.15076·cs.SE·December 18, 2025

An Exploratory Study of Bayesian Prompt Optimization for Test-Driven Code Generation with Large Language Models

Shlok Tomar, Aryan Deshwal, Ethan Villalovoz, Mattia Fazzini, Haipeng Cai, Janardhan Rao Doppa

PDF

Open Access

TL;DR

This paper introduces BODE-GEN, a Bayesian optimization method that adaptively searches for optimal prompts in continuous embedding space to improve the accuracy of code generated by large language models, demonstrating efficiency and effectiveness.

Contribution

It proposes a novel BO approach for prompt optimization in code generation, utilizing embedding space and auxiliary LLMs to enhance performance over fixed prompts.

Findings

01

BODE-GEN improves code accuracy over fixed prompts.

02

It is sample-efficient, requiring few iterations.

03

Effective across multiple LLMs and benchmarks.

Abstract

We consider the task of generating functionally correct code using large language models (LLMs). The correctness of generated code is influenced by the prompt used to query the given base LLM. We formulate the problem of finding the appropriate prompt as combinatorial search process and propose a Bayesian optimization (BO) approach referred to as {\em BO for Code GENeration (BODE-GEN)}. BODE-GEN performs an adaptive data-driven search over prompts guided by training data in the form of prompts tried and the functional accuracy of the generated code over a set of given test cases. The key insight is to perform BO in continuous embedding space by using an auxiliary LLM to bridge the gap between discrete prompt space and continuous embedding space. We leverage two synergistic ideas, namely, random projections and dimensionality scaled priors, to build effective Gaussian process based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms