Learning Program Semantics with Code Representations: An Empirical Study
Jing Kai Siow, Shangqing Liu, Xiaofei Xie, Guozhu Meng, Yang Liu

TL;DR
This empirical study systematically evaluates various program representation techniques across multiple code intelligence tasks, revealing the superiority of graph-based methods and the importance of node textual information.
Contribution
It categorizes and compares four main program representation techniques across three tasks, providing comprehensive insights into their relative effectiveness and task-specific requirements.
Findings
Graph-based representations outperform other techniques.
Node textual information is more critical than node type.
Combining multiple semantics can enhance performance.
Abstract
Program semantics learning is the core and fundamental for various code intelligent tasks e.g., vulnerability detection, clone detection. A considerable amount of existing works propose diverse approaches to learn the program semantics for different tasks and these works have achieved state-of-the-art performance. However, currently, a comprehensive and systematic study on evaluating different program representation techniques across diverse tasks is still missed. From this starting point, in this paper, we conduct an empirical study to evaluate different program representation techniques. Specifically, we categorize current mainstream code representation techniques into four categories i.e., Feature-based, Sequence-based, Tree-based, and Graph-based program representation technique and evaluate its performance on three diverse and popular code intelligent tasks i.e., {Code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Web Application Security Vulnerabilities
