Augmenting the Interpretability of GraphCodeBERT for Code Similarity   Tasks

Jorge Martinez-Gil

arXiv:2410.05275·cs.IR·April 14, 2025

Augmenting the Interpretability of GraphCodeBERT for Code Similarity Tasks

Jorge Martinez-Gil

PDF

Open Access 1 Repo

TL;DR

This paper enhances the interpretability of GraphCodeBERT for code similarity tasks by enabling clearer identification of semantic relationships, aiding developers in understanding and trusting similarity assessments.

Contribution

It introduces a method to improve the transparency of code similarity detection using GraphCodeBERT, balancing semantic accuracy with interpretability.

Findings

01

Improved interpretability of code similarity results.

02

Enhanced understanding of semantic relationships in code.

03

Open-source implementation available.

Abstract

Assessing the degree of similarity of code fragments is crucial for ensuring software quality, but it remains challenging due to the need to capture the deeper semantic aspects of code. Traditional syntactic methods often fail to identify these connections. Recent advancements have addressed this challenge, though they frequently sacrifice interpretability. To improve this, we present an approach aiming to improve the transparency of the similarity assessment by using GraphCodeBERT, which enables the identification of semantic relationships between code fragments. This approach identifies similar code fragments and clarifies the reasons behind that identification, helping developers better understand and trust the results. The source code for our implementation is available at https://www.github.com/jorge-martinez-gil/graphcodebert-interpretability.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jorge-martinez-gil/graphcodebert-interpretability
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Testing and Debugging Techniques