TRACED: Execution-aware Pre-training for Source Code
Yangruibo Ding, Ben Steenhoek, Kexin Pei, Gail Kaiser, Wei Le,, Baishakhi Ray

TL;DR
TRACED introduces an execution-aware pre-training method for source code models by incorporating execution traces, significantly enhancing their ability to understand dynamic program properties and improve performance on code understanding tasks.
Contribution
It proposes a novel pre-training strategy that integrates execution traces into source code models, bridging the gap between static analysis and dynamic program semantics.
Findings
12.4% improvement in complete execution path prediction
25.2% enhancement in runtime variable value prediction
Outperforms static models in clone retrieval and vulnerability detection
Abstract
Most existing pre-trained language models for source code focus on learning the static code text, typically augmented with static code structures (abstract syntax tree, dependency graphs, etc.). However, program semantics will not be fully exposed before the real execution. Without an understanding of the program execution, statically pre-trained models fail to comprehensively capture the dynamic code properties, such as the branch coverage and the runtime variable values, and they are consequently less effective at code understanding tasks, such as retrieving semantic clones and detecting software vulnerabilities. To close the gap between the static nature of language models and the dynamic characteristics of programs, we introduce TRACED, an execution-aware pre-training strategy for source code. Specifically, we pre-train code language models with a combination of source code,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Advanced Malware Detection Techniques
