CSSG: Measuring Code Similarity with Semantic Graphs
Yiyang Lu, Jingwen Xu, Changze Lv, Zisu Huang, Zhengkang Guo, Zhengyuan Wang, Muzhao Tian, Xuanjing Huang, Xiaoqing Zheng

TL;DR
CSSG introduces a semantics-aware code similarity metric using program dependence graphs, outperforming existing surface-level and syntax-based metrics in distinguishing code similarity across various settings.
Contribution
The paper presents CSSG, a novel semantic graph-based metric that captures control dependencies and variable interactions for improved code similarity measurement.
Findings
CSSG outperforms existing metrics on the CodeContests+ dataset.
CSSG effectively distinguishes similar and dissimilar code in monolingual and cross-lingual settings.
Dependency-aware graph representations enhance code similarity assessment.
Abstract
Existing code similarity metrics, such as BLEU, CodeBLEU, and TSED, largely rely on surface-level string overlap or abstract syntax tree structures, and often fail to capture deeper semantic relationships between programs.We propose CSSG (Code Similarity using Semantic Graphs), a novel metric that leverages program dependence graphs to explicitly model control dependencies and variable interactions, providing a semantics-aware representation of code.Experiments on the CodeContests+ dataset show that CSSG consistently outperforms existing metrics in distinguishing more similar code from less similar code under both monolingual and cross-lingual settings, demonstrating that dependency-aware graph representations offer a more effective alternative to surface-level or syntax-based similarity measures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software System Performance and Reliability
