Effective LLM-Driven Code Generation with Pythoness
Kyla H. Levin, Kyle Gwilt, Emery D. Berger, Stephen N. Freund

TL;DR
Pythoness is an embedded DSL that enables developers to specify behavioral tests at a high level, guiding LLM-based code generation to produce more reliable and correct code while reducing associated risks.
Contribution
It introduces Pythoness, a novel approach that uses behavioral specifications to guide LLM-driven code generation, improving code quality and safety.
Findings
Pythoness successfully generates code that passes specified tests.
The approach yields higher quality code compared to using specifications alone.
Guided code generation reduces risks associated with LLM-produced code.
Abstract
The advent of large language models (LLMs) has paved the way for a new era of programming tools with both significant capabilities and risks, as the generated code lacks guarantees of correctness and reliability. Developers using LLMs currently face the difficult task of optimizing, integrating, and maintaining code generated by AI. We propose an embedded domain-specific language (DSL), Pythoness, to address those challenges. In Pythoness, developers program with LLMs at a higher level of abstraction. Rather than interacting directly with generated code, developers using Pythoness operate at the level of behavioral specifications when writing functions, classes, or an entire program. These specifications can take the form of unit tests and property-based tests, which may be expressed formally or in natural language. Guided by these specifications, Pythoness generates code that both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing
