CodeSAM: Source Code Representation Learning by Infusing Self-Attention with Multi-Code-View Graphs
Alex Mathai, Kranthi Sedamaki, Debeshee Das, Noble Saji Mathews,, Srikanth Tamilselvam, Sridhar Chimalakonda, Atul Kumar

TL;DR
CodeSAM introduces a flexible framework for enhancing transformer-based source code models by integrating multiple structural code-views through self-attention masks, leading to improved performance on key software engineering tasks.
Contribution
It presents a novel scalable method to infuse multiple code-views into transformer models via self-attention masks, improving downstream SE task performance.
Findings
Outperforms GraphCodeBERT and CodeBERT on code search, clone detection, and classification.
Utilizing multiple code-views enhances model effectiveness.
Enables resource-efficient, high-performing code representations.
Abstract
Machine Learning (ML) for software engineering (SE) has gained prominence due to its ability to significantly enhance the performance of various SE applications. This progress is largely attributed to the development of generalizable source code representations that effectively capture the syntactic and semantic characteristics of code. In recent years, pre-trained transformer-based models, inspired by natural language processing (NLP), have shown remarkable success in SE tasks. However, source code contains structural and semantic properties embedded within its grammar, which can be extracted from structured code-views like the Abstract Syntax Tree (AST), Data-Flow Graph (DFG), and Control-Flow Graph (CFG). These code-views can complement NLP techniques, further improving SE tasks. Unfortunately, there are no flexible frameworks to infuse arbitrary code-views into existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Text Readability and Simplification · Natural Language Processing Techniques
MethodsCodeBERT
