Energy-Based Models for Code Generation under Compilability Constraints
Tomasz Korbak, Hady Elsahar, Marc Dymetman, Germ\'an, Kruszewski

TL;DR
This paper introduces an energy-based modeling approach for generating compilable source code, leveraging constraint satisfaction and policy gradient methods to improve code correctness without reducing diversity.
Contribution
It presents a novel energy-based model framework combined with KL-Adaptive Policy Gradient to generate only compilable code sequences, addressing limitations of traditional language models.
Findings
Improved compilability rates in generated code
Maintained diversity and complexity of outputs
Demonstrated effectiveness of energy-based constraints
Abstract
Neural language models can be successfully trained on source code, leading to applications such as code completion. However, their versatile autoregressive self-supervision objective overlooks important global sequence-level features that are present in the data such as syntactic correctness or compilability. In this work, we pose the problem of learning to generate compilable code as constraint satisfaction. We define an Energy-Based Model (EBM) representing a pre-trained generative model with an imposed constraint of generating only compilable sequences. We then use the KL-Adaptive Distributional Policy Gradient algorithm (Khalifa et al., 2021) to train a generative model approximating the EBM. We conduct experiments showing that our proposed approach is able to improve compilability rates without sacrificing diversity and complexity of the generated samples.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
Methodsenergy-based model
