Adding Context to Source Code Representations for Deep Learning
Fuwei Tian, Christoph Treude

TL;DR
This paper argues that incorporating additional contextual information, such as call hierarchy, into source code representations can enhance deep learning models' performance on software engineering tasks.
Contribution
It introduces the idea of adding call hierarchy context to source code representations and provides preliminary evidence of performance improvements.
Findings
Encoding call hierarchy improves model accuracy
Contextual information enhances code analysis tasks
Preliminary results support further research in context-aware representations
Abstract
Deep learning models have been successfully applied to a variety of software engineering tasks, such as code classification, summarisation, and bug and vulnerability detection. In order to apply deep learning to these tasks, source code needs to be represented in a format that is suitable for input into the deep learning model. Most approaches to representing source code, such as tokens, abstract syntax trees (ASTs), data flow graphs (DFGs), and control flow graphs (CFGs) only focus on the code itself and do not take into account additional context that could be useful for deep learning models. In this paper, we argue that it is beneficial for deep learning models to have access to additional contextual information about the code being analysed. We present preliminary evidence that encoding context from the call hierarchy along with information from the code itself can improve the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Advanced Malware Detection Techniques
