Benchmarking LLM-Based Static Analysis for Secure Smart Contract Development: Reliability, Limitations, and Potential Hybrid Solutions
Stefan-Claudiu Susan, Andrei Arusoaie, Dorel Lucanu

TL;DR
This paper evaluates the reliability and limitations of Large Language Models in detecting vulnerabilities in smart contracts, highlighting their current shortcomings and potential hybrid solutions.
Contribution
It provides an empirical assessment of LLM-based static analysis for smart contracts, revealing accuracy issues and proposing a framework for evaluation.
Findings
LLMs are hindered by lexical bias and lack of rigorous validation.
High false positive rate due to reliance on non-semantic heuristics.
A custom framework achieves 92% accuracy in classifying model outputs.
Abstract
The irreversible nature of blockchain transactions makes the identification of smart contract vulnerabilities an essential requirement for secure system development. While Large Language Models (LLMs) are increasingly integrated into developer workflows, their reliability as autonomous security auditors remains unproven. We assess whether current generative models are a viable replacement for, or only a complement to, traditional static-analysis tools. Our findings indicate that LLM efficacy is undermined by both inherent lexical bias and a lack of rigorous validation of external data inputs. This reliance on non-semantic heuristics, such as identifier naming, leads to a high frequency of false positives. Furthermore, prompting techniques reveal a trade-off between precision and recall. These results were derived using our custom automated framework, which achieves 92% accuracy in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
