Diffusion On Syntax Trees For Program Synthesis

Shreyas Kapur; Erik Jenner; Stuart Russell

arXiv:2405.20519·cs.AI·June 3, 2024

Diffusion On Syntax Trees For Program Synthesis

Shreyas Kapur, Erik Jenner, Stuart Russell

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper introduces neural diffusion models operating on syntax trees for program synthesis, enabling iterative editing of code with maintained syntax validity, and demonstrates their effectiveness in inverse graphics and sketch-based program generation.

Contribution

It presents a novel diffusion-based approach on syntax trees for program synthesis, integrating editing and search to improve code generation and debugging.

Findings

01

Effective in inverse graphics tasks converting images to programs

02

Able to generate graphics programs from hand-drawn sketches

03

Enhances program synthesis with syntax-preserving iterative editing

Abstract

Large language models generate code one token at a time. Their autoregressive generation process lacks the feedback of observing the program's output. Training LLMs to suggest edits directly can be challenging due to the scarcity of rich edit data. To address these problems, we propose neural diffusion models that operate on syntax trees of any context-free grammar. Similar to image diffusion models, our method also inverts ``noise'' applied to syntax trees. Rather than generating code sequentially, we iteratively edit it while preserving syntactic validity, which makes it easy to combine this neural model with search. We apply our approach to inverse graphics tasks, where our model learns to convert images into programs that produce those images. Combined with search, our model is able to write graphics programs, see the execution result, and debug them to meet the required…

Peer Reviews

Decision·ICLR 2025 Spotlight

Reviewer 01Rating 8Confidence 4

Strengths

The method is simple but non-obvious The more general problem of program synthesis conditioned on desired outputs is very relevant The authors use randomly generated programs as a dataset which sidesteps dataset curation in favor of just a specification of the language The paper is well-written, easy to understand, and has nice and (mostly) clear figures

Weaknesses

The paper is somewhat limited in scope (simple problem setup) in ways that make it not entirely obvious how the method "scales" to more complex relevant tasks like code generation. Some minor things covered in Questions

Reviewer 02Rating 8Confidence 3

Strengths

- Innovative Approach: The paper presents a novel combination of autoregressive, diffusion, and search methodologies, which, despite being applied to a specific domain, holds potential for broader applications. The reverse mutation path algorithm also provides an efficient way to generate training targets. - Clarity and Replicability: The manuscript is well-written and easy to follow, providing sufficient detail to enable replication of the experiments. - Comprehensive Ablation Studies: The aut

Weaknesses

- Literature Coverage: The authors should consider citing "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation" in the Neural program synthesis section since this work also takes multiple passes of the program and edits the program. - The value network (vϕ) training and effectiveness aren't thoroughly evaluated. Alternative approaches to edit distance estimation, including direct calculation from syntax trees, are not explored or compared.

Reviewer 03Rating 6Confidence 3

Strengths

1. The main strength of this paper is the design of a neurosymbolic framework to evaluate the automated (i.e. diffusion-based) conversion of images into context-free grammar. This formal evaluation ensures that the desired specifications are met through iterative observation of the execution results and verification. 2. The authors extend the approach to accept hand-drawn sketches and illustrate examples in the appendix confirming the applicability of the approach in several real-world settings

Weaknesses

There are three main weaknesses I would like to bring up. The authors are encouraged to rebut and provide legitimate explanations, if any, against these and the review decision may be adjusted accordingly. 1. A claim made by the author states that the proposed method focuses on editing the program synthesized from the image, unlike prior works that autoregressively generate programs that are incrementally better. In doing so, the authors propose adding random noise to modify a base syntax tree

Videos

Diffusion On Syntax Trees For Program Synthesis· slideslive

Taxonomy

TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Logic, programming, and type systems

MethodsDiffusion