TL;DR
This study demonstrates that fine-tuned open-source models like Detect Llama and GPT-3.5FT can outperform GPT-4 in detecting vulnerabilities in smart contracts, with higher F1 scores on a specialized dataset.
Contribution
The paper introduces fine-tuned open-source models that surpass GPT-4 in smart contract vulnerability detection, providing a cost-effective alternative to proprietary models.
Findings
GPT-3.5FT and Detect Llama - Foundation outperform GPT-4 in binary vulnerability classification.
Fine-tuned models achieve higher F1 scores on vulnerability identification.
Open-source models can effectively detect smart contract vulnerabilities, rivaling proprietary models.
Abstract
In this paper, we test the hypothesis that although OpenAI's GPT-4 performs well generally, we can fine-tune open-source models to outperform GPT-4 in smart contract vulnerability detection. We fine-tune two models from Meta's Code Llama and a dataset of 17k prompts, Detect Llama - Foundation and Detect Llama - Instruct, and we also fine-tune OpenAI's GPT-3.5 Turbo model (GPT-3.5FT). We then evaluate these models, plus a random baseline, on a testset we develop against GPT-4, and GPT-4 Turbo's, detection of eight vulnerabilities from the dataset and the two top identified vulnerabilities - and their weighted F1 scores. We find that for binary classification (i.e., is this smart contract vulnerable?), our two best-performing models, GPT-3.5FT and Detect Llama - Foundation, achieve F1 scores of and , outperforming both GPT-4 and GPT-4 Turbo, and . For the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
