Smart-LLaMA: Two-Stage Post-Training of Large Language Models for Smart   Contract Vulnerability Detection and Explanation

Lei Yu; Shiqi Chen; Hang Yuan; Peng Wang; Zhirong Huang; Jingyuan; Zhang; Chenjie Shen; Fengjun Zhang; Li Yang; Jiajia Ma

arXiv:2411.06221·cs.CR·November 12, 2024·2 cites

Smart-LLaMA: Two-Stage Post-Training of Large Language Models for Smart Contract Vulnerability Detection and Explanation

Lei Yu, Shiqi Chen, Hang Yuan, Peng Wang, Zhirong Huang, Jingyuan, Zhang, Chenjie Shen, Fengjun Zhang, Li Yang, Jiajia Ma

PDF

Open Access

TL;DR

Smart-LLaMA is a novel two-stage post-training approach for large language models that improves smart contract vulnerability detection and explanation by domain-specific pre-training and explanation-guided fine-tuning, outperforming existing methods.

Contribution

It introduces a comprehensive dataset, smart contract-specific continual pre-training, and explanation-guided fine-tuning for LLMs in smart contract security.

Findings

01

Outperforms state-of-the-art baselines in detection accuracy

02

Provides reliable and detailed vulnerability explanations

03

Achieves 6.49% F1 score and 3.78% accuracy improvements

Abstract

With the rapid development of blockchain technology, smart contract security has become a critical challenge. Existing smart contract vulnerability detection methods face three main issues: (1) Insufficient quality of datasets, lacking detailed explanations and precise vulnerability locations. (2) Limited adaptability of large language models (LLMs) to the smart contract domain, as most LLMs are pre-trained on general text data but minimal smart contract-specific data. (3) Lack of high-quality explanations for detected vulnerabilities, as existing methods focus solely on detection without clear explanations. These limitations hinder detection performance and make it harder for developers to understand and fix vulnerabilities quickly, potentially leading to severe financial losses. To address these problems, we propose Smart-LLaMA, an advanced detection method based on the LLaMA language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · FinTech, Crowdfunding, Digital Finance · Insurance and Financial Risk Management

MethodsLLaMA · Focus