Guess & Sketch: Language Model Guided Transpilation

Celine Lee; Abdulrahman Mahmoud; Michal Kurek; Simone Campanoni; David; Brooks; Stephen Chong; Gu-Yeon Wei; Alexander M. Rush

arXiv:2309.14396·cs.SE·March 18, 2024

Guess & Sketch: Language Model Guided Transpilation

Celine Lee, Abdulrahman Mahmoud, Michal Kurek, Simone Campanoni, David, Brooks, Stephen Chong, Gu-Yeon Wei, Alexander M. Rush

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper introduces Guess & Sketch, a neurosymbolic approach combining language models and symbolic solvers to improve assembly code transpilation, outperforming GPT-4 and traditional methods in accuracy.

Contribution

It proposes a novel neurosymbolic method for assembly code transpilation that leverages language models and symbolic solvers to enhance correctness and scalability.

Findings

01

Transpiles 57.6% more examples than GPT-4.

02

Transpiles 39.6% more examples than an engineered transpiler.

03

Provides a new dataset for assembly transpilation tasks.

Abstract

Maintaining legacy software requires many software and systems engineering hours. Assembly code programs, which demand low-level control over the computer machine state and have no variable names, are particularly difficult for humans to analyze. Existing conventional program translators guarantee correctness, but are hand-engineered for the source and target programming languages in question. Learned transpilation, i.e. automatic translation of code, offers an alternative to manual re-writing and engineering efforts. Automated symbolic program translation approaches guarantee correctness but struggle to scale to longer programs due to the exponentially large search space. Their rigid rule-based systems also limit their expressivity, so they can only reason about a reduced space of programs. Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 2

Strengths

- While a simple concept, the method outperforms prior work - The concept of uncertainty is a good mapping to identify holes in the generated program - Evaluation is robust and thorough, providing analysis of failure cases - Authors identify a setting for Neuro-symbolic approaches to work stably and outperform prior works

Weaknesses

- Novelty within this approach is quite limited the translation is a standard approach the confidence is simple (see below), and they use an existing neuro-symbolic solver therefore, it is more on the sole idea of putting these together. This is the main criticism. However, they outperform prior work, and the idea is interesting and technically sound. - Confidence is very trivially explained. In general, deep models are very confident even when they are wrong. It isn't clear how this was implem

Reviewer 02Rating 8· accept, good paperConfidence 2

Strengths

The paper is well written and presents a clear contribution. The combination of generative language models and program synthesis by sketching is new and it is shown to be effective as compared to state of the art techniques.

Weaknesses

I could not understand the correctness guarantee provided by the approach. The authors say "the correctness of GUESS & SKETCH is always lower-bounded by the correctness of the initial guess" -- the authors should explain what they mean by lower bound here. If the translation is incorrect, how can it be useful in practice? The scalability is unclear. What s the largest program that has been translated using the approach presented here?

Reviewer 03Rating 8· accept, good paperConfidence 4

Strengths

## the good part of quality: that it worked The presented method works, on a domain of highly structured translation task (i.e. highly stylized texts), something a language model should perform very well at, and it shows. The extra care taken to correct the translation locally is a reasonable yet good idea to complement the weakness of the language model. The benchmark is thorough, and the evaluation (on what is being shown) is solid. ## clarity I am very grateful how this work is able to enc

Weaknesses

## the not so good part of quality: ### evaluation set is small This work can be significantly beefed up with a synthetic test set. Evaluation on mere 100s of programs is likely not sufficient. Since it is possible to compile C into both architectures, and since test generation / fuzzing is a well established approach, this work can benefit from an artificial/synthetic test set consists of about ~1k programs, to evaluate the correctness of the transpiler more thoroughly. ### lack of statisti

Videos

Guess & Sketch: Language Model Guided Transpilation· slideslive

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Machine Learning and Data Classification

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Adam · Layer Normalization · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Dense Connections