COSET: A Benchmark for Evaluating Neural Program Embeddings
Ke Wang, Mihai Christodorescu

TL;DR
This paper introduces COSET, a benchmarking framework with diverse program datasets and transformations, to evaluate neural program embeddings focused on semantics, aiding in understanding model strengths and limitations.
Contribution
The paper presents COSET, a novel benchmark with program transformations for evaluating neural embeddings of code, emphasizing semantic modeling and model debugging.
Findings
COSET effectively identifies model strengths and weaknesses.
Transformations simulate real-world code changes.
Pilot study compares four neural models.
Abstract
Neural program embedding can be helpful in analyzing large software, a task that is challenging for traditional logic-based program analyses due to their limited scalability. A key focus of recent machine-learning advances in this area is on modeling program semantics instead of just syntax. Unfortunately evaluating such advances is not obvious, as program semantics does not lend itself to straightforward metrics. In this paper, we introduce a benchmarking framework called COSET for standardizing the evaluation of neural program embeddings. COSET consists of a diverse dataset of programs in source-code format, labeled by human experts according to a number of program properties of interest. A point of novelty is a suite of program transformations included in COSET. These transformations when applied to the base dataset can simulate natural changes to program code due to optimization and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Adversarial Robustness in Machine Learning · Software Testing and Debugging Techniques
MethodsGraph Neural Network
