Detect Llama -- Finding Vulnerabilities in Smart Contracts using Large   Language Models

Peter Ince; Xiapu Luo; Jiangshan Yu; Joseph K. Liu; Xiaoning Du

arXiv:2407.08969·cs.CR·July 17, 2024

Detect Llama -- Finding Vulnerabilities in Smart Contracts using Large Language Models

Peter Ince, Xiapu Luo, Jiangshan Yu, Joseph K. Liu, Xiaoning Du

PDF

1 Repo

TL;DR

This study demonstrates that fine-tuned open-source models like Detect Llama and GPT-3.5FT can outperform GPT-4 in detecting vulnerabilities in smart contracts, with higher F1 scores on a specialized dataset.

Contribution

The paper introduces fine-tuned open-source models that surpass GPT-4 in smart contract vulnerability detection, providing a cost-effective alternative to proprietary models.

Findings

01

GPT-3.5FT and Detect Llama - Foundation outperform GPT-4 in binary vulnerability classification.

02

Fine-tuned models achieve higher F1 scores on vulnerability identification.

03

Open-source models can effectively detect smart contract vulnerabilities, rivaling proprietary models.

Abstract

In this paper, we test the hypothesis that although OpenAI's GPT-4 performs well generally, we can fine-tune open-source models to outperform GPT-4 in smart contract vulnerability detection. We fine-tune two models from Meta's Code Llama and a dataset of 17k prompts, Detect Llama - Foundation and Detect Llama - Instruct, and we also fine-tune OpenAI's GPT-3.5 Turbo model (GPT-3.5FT). We then evaluate these models, plus a random baseline, on a testset we develop against GPT-4, and GPT-4 Turbo's, detection of eight vulnerabilities from the dataset and the two top identified vulnerabilities - and their weighted F1 scores. We find that for binary classification (i.e., is this smart contract vulnerable?), our two best-performing models, GPT-3.5FT and Detect Llama - Foundation, achieve F1 scores of $0.776$ and $0.68$ , outperforming both GPT-4 and GPT-4 Turbo, $0.66$ and $0.675$ . For the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

peterdouglas/detect-llama-evaluation
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.