How could Neural Networks understand Programs?
Dinglan Peng, Shuxin Zheng, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu

TL;DR
This paper introduces OSCAR, a hierarchical Transformer model that enhances program understanding by learning from representations aligned with formal operational semantics and environment transitions, surpassing traditional NLP-based approaches.
Contribution
The paper proposes a novel semantics learning paradigm for code, utilizing IR and static analysis to improve program comprehension beyond existing NLP pre-training methods.
Findings
OSCAR outperforms existing models on multiple software engineering tasks.
Learning from formal semantics improves program understanding.
Hierarchical Transformer effectively captures fundamental program operations.
Abstract
Semantic understanding of programs is a fundamental problem for programming language processing (PLP). Recent works that learn representations of code based on pre-training techniques in NLP have pushed the frontiers in this direction. However, the semantics of PL and NL have essential differences. These being ignored, we believe it is difficult to build a model to better understand programs, by either directly applying off-the-shelf NLP pre-training techniques to the source code, or adding features to the model by the heuristic. In fact, the semantics of a program can be rigorously defined by formal semantics in PL theory. For example, the operational semantics, describes the meaning of a valid program as updating the environment (i.e., the memory address-value function) through fundamental operations, such as memory I/O and conditional branching. Inspired by this, we propose a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Topic Modeling
MethodsOSCAR
