Code Vectors: Understanding Programs Through Embedded Abstracted   Symbolic Traces

Jordan Henkel; Shuvendu K. Lahiri; Ben Liblit; Thomas Reps

arXiv:1803.06686·cs.SE·August 21, 2018

Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces

Jordan Henkel, Shuvendu K. Lahiri, Ben Liblit, Thomas Reps

PDF

1 Repo

TL;DR

This paper introduces a novel approach to program understanding by transforming symbolic execution traces into embeddings, achieving high accuracy on API-usage analogy tasks and demonstrating the effectiveness of semantic abstractions.

Contribution

The paper presents a new method for representing programs using embeddings of symbolic execution traces, highlighting the importance of semantic abstractions for improved accuracy.

Findings

01

Achieved 93% top-1 accuracy on API-usage analogy benchmark.

02

Semantic abstractions significantly outperform syntactic ones in embedding quality.

03

Embeddings trained with various parameters show robustness across tasks.

Abstract

With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning technique can be applied. In this paper, we use abstractions of traces obtained from symbolic execution of a program as a representation for learning word embeddings. We trained a variety of word embeddings under hundreds of parameterizations, and evaluated each learned embedding on a suite of different tasks. In our evaluation, we obtain 93% top-1 accuracy on a benchmark consisting of over 19,000 API-usage analogies extracted from the Linux kernel. In addition, we show that embeddings learned from (mainly) semantic abstractions provide nearly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jjhenkel/code-vectors-artifact
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.