Synthetic Programming Elicitation for Text-to-Code in Very Low-Resource Programming and Formal Languages
Federico Mora, Justin Wong, Haley Lepe, Sahil Bhatia, Karim, Elmaaroufi, George Varghese, Joseph E. Gonzalez, Elizabeth Polgreen, Sanjit, A. Seshia

TL;DR
This paper introduces SPEAC, a novel method that uses an intermediate language and compiler techniques to enable large language models to generate syntactically valid code in very low-resource programming languages, improving correctness in formal verification tasks.
Contribution
The paper proposes synthetic programming elicitation and compilation (SPEAC), a new approach that facilitates code generation in VLPLs by leveraging an intermediate language and compiler repairs.
Findings
SPEAC increases syntactic correctness of generated code in VLPLs.
SPEAC outperforms retrieval and fine-tuning baselines.
SPEAC maintains semantic correctness while improving syntax validity.
Abstract
Recent advances in large language models (LLMs) for code applications have demonstrated remarkable zero-shot fluency and instruction following on challenging code related tasks ranging from test case generation to self-repair. Unsurprisingly, however, models struggle to compose syntactically valid programs in programming languages unrepresented in pre-training, referred to as very low-resource Programming Languages (VLPLs). VLPLs appear in crucial settings, including domain-specific languages for internal tools, tool-chains for legacy languages, and formal verification frameworks. Inspired by a technique called natural programming elicitation, we propose designing an intermediate language that LLMs "naturally" know how to use and which can be automatically compiled to a target VLPL. When LLMs generate code that lies outside of this intermediate language, we use compiler techniques to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Embedded Systems Design Techniques
