DeepCode AI Fix: Fixing Security Vulnerabilities with Large Language   Models

Berkay Berabi; Alexey Gronskiy; Veselin Raychev; Gishor Sivanrupan,; Victor Chibotaru; Martin Vechev

arXiv:2402.13291·cs.CR·February 26, 2024·3 cites

DeepCode AI Fix: Fixing Security Vulnerabilities with Large Language Models

Berkay Berabi, Alexey Gronskiy, Veselin Raychev, Gishor Sivanrupan,, Victor Chibotaru, Martin Vechev

PDF

Open Access

TL;DR

This paper explores using large language models with program analysis-based code reduction to effectively fix complex security vulnerabilities, achieving significant defect removal and matching human fixes in many cases.

Contribution

It introduces a novel approach combining program analysis with LLMs to improve security bug fixing, reducing training data needs and enhancing fix accuracy.

Findings

01

Over 80% defect removal rate with the best system

02

Exact human fix matching in 10-50% of cases

03

Outperforms GPT-3.5, GPT-4, and TFix baselines

Abstract

The automated program repair field has attracted substantial interest over the years, but despite significant research efforts, creating a system that works well for complex semantic bugs such as security vulnerabilities has proven difficult. A promising direction to solve this challenge is by leveraging large language models (LLMs), which are increasingly used to solve various programming tasks. In this paper, we investigate the effectiveness of LLMs for solving code-repair task. We show that the task is difficult as it requires the model to learn long-range code relationships, a task that inherently relies on extensive amounts of training data. At the same time, creating a large, clean dataset for complex program bugs and their corresponding fixes is non-trivial. We propose a technique to address these challenges with a new approach for querying and fine-tuning LLMs. The idea is to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Digital and Cyber Forensics

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Label Smoothing · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Transformer