Efficient Avoidance of Vulnerabilities in Auto-completed Smart Contract   Code Using Vulnerability-constrained Decoding

Andr\'e Storhaug; Jingyue Li; and Tianyuan Hu

arXiv:2309.09826·cs.CR·October 9, 2023·1 cites

Efficient Avoidance of Vulnerabilities in Auto-completed Smart Contract Code Using Vulnerability-constrained Decoding

Andr\'e Storhaug, Jingyue Li, and Tianyuan Hu

PDF

Open Access 1 Models 1 Datasets

TL;DR

This paper introduces a vulnerability-constrained decoding method for transformer-based code auto-completion models, significantly reducing the generation of vulnerable smart contract code during automatic completion.

Contribution

It presents a novel fine-tuning and decoding approach that acts as an embedded classifier to avoid generating vulnerable code in smart contract auto-completion.

Findings

01

The fine-tuned model achieved an average BLEU score of 0.557.

02

Over 70% of auto-completed codes were initially vulnerable.

03

The approach avoided generating 67% of potential vulnerabilities.

Abstract

Auto-completing code enables developers to speed up coding significantly. Recent advances in transformer-based large language model (LLM) technologies have been applied to code synthesis. However, studies show that many of such synthesized codes contain vulnerabilities. We propose a novel vulnerability-constrained decoding approach to reduce the amount of vulnerable code generated by such models. Using a small dataset of labeled vulnerable lines of code, we fine-tune an LLM to include vulnerability labels when generating code, acting as an embedded classifier. Then, during decoding, we deny the model to generate these labels to avoid generating vulnerable code. To evaluate the method, we chose to automatically complete Ethereum Blockchain smart contracts (SCs) as the case study due to the strict requirements of SC security. We first fine-tuned the 6-billion-parameter GPT-J model using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
andstor/gpt-j-6B-smart-contract
model· ♡ 4
♡ 4

Datasets

andstor/smart_contracts
dataset· 18 dl
18 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBlockchain Technology Applications and Security

MethodsVulnerability-constrained Decoding