Vulnerability Detection via Topological Analysis of Attention Maps
Pavel Snopov, Andrey Nikolaevich Golubinskiy

TL;DR
This paper introduces a novel vulnerability detection method using topological data analysis on BERT attention maps, showing that topological features can effectively identify vulnerabilities, rivaling traditional language models.
Contribution
The study demonstrates that topological features extracted from attention matrices can be used with machine learning to detect vulnerabilities, offering a new perspective beyond conventional static analysis.
Findings
Topological features from attention maps are effective for vulnerability detection.
ML models trained on topological features perform competitively with LLMs.
TDA tools capture semantic information relevant to vulnerabilities.
Abstract
Recently, deep learning (DL) approaches to vulnerability detection have gained significant traction. These methods demonstrate promising results, often surpassing traditional static code analysis tools in effectiveness. In this study, we explore a novel approach to vulnerability detection utilizing the tools from topological data analysis (TDA) on the attention matrices of the BERT model. Our findings reveal that traditional machine learning (ML) techniques, when trained on the topological features extracted from these attention matrices, can perform competitively with pre-trained language models (LLMs) such as CodeBERTa. This suggests that TDA tools, including persistent homology, are capable of effectively capturing semantic information critical for identifying vulnerabilities.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Softmax · Multi-Head Attention · Layer Normalization · Dense Connections · Adam · WordPiece · Attention Dropout · Residual Connection
