From Words to Code: Harnessing Data for Program Synthesis from Natural Language
Anirudh Khatry, Joyce Cahoon, Jordan Henkel, Shaleen Deep, Venkatesh, Emani, Avrilia Floratou, Sumit Gulwani, Vu Le, Mohammad Raza, Sherry Shi,, Mukul Singh, Ashish Tiwari

TL;DR
This paper enhances program synthesis from natural language by leveraging data context and output-based reranking of LLM-generated code, achieving significant accuracy improvements across multiple domains.
Contribution
It introduces semantic reranking and temperature mixing techniques that utilize data execution outputs to improve code generation accuracy from LLMs.
Findings
Up to 45% improvement in top-1 accuracy.
Up to 34% improvement in top-3 accuracy.
Effective across SQL, Pandas, and Power Query M domains.
Abstract
Creating programs to correctly manipulate data is a difficult task, as the underlying programming languages and APIs can be challenging to learn for many users who are not skilled programmers. Large language models (LLMs) demonstrate remarkable potential for generating code from natural language, but in the data manipulation domain, apart from the natural language (NL) description of the intended task, we also have the dataset on which the task is to be performed, or the "data context". Existing approaches have utilized data context in a limited way by simply adding relevant information from the input data into the prompts sent to the LLM. In this work, we utilize the available input data to execute the candidate programs generated by the LLMs and gather their outputs. We introduce semantic reranking, a technique to rerank the programs generated by LLMs based on three signals coming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Testing and Debugging Techniques
