Variables are a Curse in Software Vulnerability Prediction
Jinghua Groppe, Sven Groppe, Ralf M\"oller

TL;DR
This paper introduces a novel approach to software vulnerability prediction that removes variable naming dependencies, enabling models to better understand code functionality and significantly reduce memory usage.
Contribution
The paper proposes a new edge type called name dependence and a 3-property encoding scheme to abstract variable names, improving vulnerability prediction and memory efficiency.
Findings
Models with the new techniques outperform existing approaches in vulnerability prediction.
Memory usage is reduced by up to 30,000 times with the proposed methods.
The approach enhances understanding of code functionality beyond surface text.
Abstract
Deep learning-based approaches for software vulnerability prediction currently mainly rely on the original text of software code as the feature of nodes in the graph of code and thus could learn a representation that is only specific to the code text, rather than the representation that depicts the 'intrinsic' functionality of a program hidden in the text representation. One curse that causes this problem is an infinite number of possibilities to name a variable. In order to lift the curse, in this work we introduce a new type of edge called name dependence, a type of abstract syntax graph based on the name dependence, and an efficient node representation method named 3-property encoding scheme. These techniques will allow us to remove the concrete variable names from code, and facilitate deep learning models to learn the functionality of software hidden in diverse code expressions. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
