Unleashing the Power of Compiler Intermediate Representation to Enhance   Neural Program Embeddings

Zongjie Li; Pingchuan Ma; Huaijin Wang; Shuai Wang; Qiyi Tang; Sen; Nie; Shi Wu

arXiv:2204.09191·cs.SE·April 21, 2022

Unleashing the Power of Compiler Intermediate Representation to Enhance Neural Program Embeddings

Zongjie Li, Pingchuan Ma, Huaijin Wang, Shuai Wang, Qiyi Tang, Sen, Nie, Shi Wu

PDF

Open Access

TL;DR

This paper explores leveraging compiler intermediate representations and genetic algorithms to improve neural program embeddings for better program analysis tasks.

Contribution

It introduces methods to enhance embeddings using LLVM IR and optimization flag sequences identified by genetic algorithms.

Findings

01

Embedding quality improves with IR-based training.

02

Genetic algorithms can find near-optimal optimization flags.

03

Enhanced embeddings benefit program analysis tasks.

Abstract

Neural program embeddings have demonstrated considerable promise in a range of program analysis tasks, including clone identification, program repair, code completion, and program synthesis. However, most existing methods generate neural program embeddings directly from the program source codes, by learning from features such as tokens, abstract syntax trees, and control flow graphs. This paper takes a fresh look at how to improve program embeddings by leveraging compiler intermediate representation (IR). We first demonstrate simple yet highly effective methods for enhancing embedding quality by training embedding models alongside source code and LLVM IR generated by default optimization levels (e.g., -O2). We then introduce IRGen, a framework based on genetic algorithms (GA), to identify (near-)optimal sequences of optimization flags that can significantly improve embedding quality.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications