Multi-LLM Collaboration + Data-Centric Innovation = 2x Better Vulnerability Repair
Xin Zhou, Kisub Kim, Bowen Xu, DongGyun Han, David Lo

TL;DR
This paper introduces VulMaster, a novel Transformer-based model that leverages multi-LLM collaboration and data-centric strategies to significantly improve automatic vulnerability repair in software, especially for complex code and expert knowledge integration.
Contribution
VulMaster combines multiple data types and LLM collaboration to enhance vulnerability repair, addressing limitations of existing DL-based methods in handling code structure and expert insights.
Findings
VulMaster outperforms state-of-the-art approaches in EM, BLEU, and CodeBLEU scores.
The model effectively handles lengthy code and incorporates expert knowledge.
Experimental results show substantial improvements on a large real-world dataset.
Abstract
The advances of deep learning (DL) have paved the way for automatic software vulnerability repair approaches, which effectively learn the mapping from the vulnerable code to the fixed code. Nevertheless, existing DL-based vulnerability repair methods face notable limitations: 1) they struggle to handle lengthy vulnerable code, 2) they treat code as natural language texts, neglecting its inherent structure, and 3) they do not tap into the valuable expert knowledge present in the expert system. To address this, we propose VulMaster, a Transformer-based neural network model that excels at generating vulnerability repairs through data-centric innovation. Specifically, VulMaster introduces the utilization and combination of various types of input data, including complete vulnerable code of any size, vulnerable code structures, and expert knowledge from the CWE system. Additionally,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Network Security and Intrusion Detection · Data Quality and Management
