Multi-View Pre-Trained Model for Code Vulnerability Identification

Xuxiang Jiang; Yinhao Xiao; Jun Wang; Wei Zhang

arXiv:2208.05227·cs.SE·August 11, 2022

Multi-View Pre-Trained Model for Code Vulnerability Identification

Xuxiang Jiang, Yinhao Xiao, Jun Wang, Wei Zhang

PDF

Open Access

TL;DR

This paper introduces MV-PTM, a multi-view pre-trained model that encodes sequential and structural code information, significantly improving vulnerability detection accuracy over existing models.

Contribution

The paper presents a novel multi-view pre-trained model that incorporates multiple structural views of code using contrastive learning, enhancing vulnerability identification.

Findings

01

MV-PTM outperforms GraphCodeBERT by 3.36% F1 score on average.

02

Encoding multiple structural views improves vulnerability detection.

03

Contrastive learning enhances code representation quality.

Abstract

Vulnerability identification is crucial for cyber security in the software-related industry. Early identification methods require significant manual efforts in crafting features or annotating vulnerable code. Although the recent pre-trained models alleviate this issue, they overlook the multiple rich structural information contained in the code itself. In this paper, we propose a novel Multi-View Pre-Trained Model (MV-PTM) that encodes both sequential and multi-type structural information of the source code and uses contrastive learning to enhance code representations. The experiments conducted on two public datasets demonstrate the superiority of MV-PTM. In particular, MV-PTM improves GraphCodeBERT by 3.36\% on average in terms of F1 score.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Web Application Security Vulnerabilities · Advanced Malware Detection Techniques

MethodsContrastive Learning