Multi-View Pre-Trained Model for Code Vulnerability Identification
Xuxiang Jiang, Yinhao Xiao, Jun Wang, Wei Zhang

TL;DR
This paper introduces MV-PTM, a multi-view pre-trained model that encodes sequential and structural code information, significantly improving vulnerability detection accuracy over existing models.
Contribution
The paper presents a novel multi-view pre-trained model that incorporates multiple structural views of code using contrastive learning, enhancing vulnerability identification.
Findings
MV-PTM outperforms GraphCodeBERT by 3.36% F1 score on average.
Encoding multiple structural views improves vulnerability detection.
Contrastive learning enhances code representation quality.
Abstract
Vulnerability identification is crucial for cyber security in the software-related industry. Early identification methods require significant manual efforts in crafting features or annotating vulnerable code. Although the recent pre-trained models alleviate this issue, they overlook the multiple rich structural information contained in the code itself. In this paper, we propose a novel Multi-View Pre-Trained Model (MV-PTM) that encodes both sequential and multi-type structural information of the source code and uses contrastive learning to enhance code representations. The experiments conducted on two public datasets demonstrate the superiority of MV-PTM. In particular, MV-PTM improves GraphCodeBERT by 3.36\% on average in terms of F1 score.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Web Application Security Vulnerabilities · Advanced Malware Detection Techniques
MethodsContrastive Learning
