Adding Context to Source Code Representations for Deep Learning

Fuwei Tian; Christoph Treude

arXiv:2208.00203·cs.SE·August 2, 2022

Adding Context to Source Code Representations for Deep Learning

Fuwei Tian, Christoph Treude

PDF

Open Access

TL;DR

This paper argues that incorporating additional contextual information, such as call hierarchy, into source code representations can enhance deep learning models' performance on software engineering tasks.

Contribution

It introduces the idea of adding call hierarchy context to source code representations and provides preliminary evidence of performance improvements.

Findings

01

Encoding call hierarchy improves model accuracy

02

Contextual information enhances code analysis tasks

03

Preliminary results support further research in context-aware representations

Abstract

Deep learning models have been successfully applied to a variety of software engineering tasks, such as code classification, summarisation, and bug and vulnerability detection. In order to apply deep learning to these tasks, source code needs to be represented in a format that is suitable for input into the deep learning model. Most approaches to representing source code, such as tokens, abstract syntax trees (ASTs), data flow graphs (DFGs), and control flow graphs (CFGs) only focus on the code itself and do not take into account additional context that could be useful for deep learning models. In this paper, we argue that it is beneficial for deep learning models to have access to additional contextual information about the code being analysed. We present preliminary evidence that encoding context from the call hierarchy along with information from the code itself can improve the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software System Performance and Reliability · Advanced Malware Detection Techniques