NATLM: Detecting Defects in NFT Smart Contracts Leveraging LLM
Yuanzheng Niu, Xiaoqi Li, Wenkai Li

TL;DR
NATLM is a framework that combines static analysis and large language models to detect common security vulnerabilities in NFT smart contracts with high accuracy, aiming to prevent financial losses from exploits.
Contribution
This paper introduces NATLM, a novel hybrid approach integrating static code analysis with LLMs for effective NFT smart contract defect detection, outperforming existing methods.
Findings
Achieved 87.72% precision in defect detection.
Detected 4 common NFT smart contract vulnerabilities.
Outperformed baseline detection methods.
Abstract
Security issues are becoming increasingly significant with the rapid evolution of Non-fungible Tokens (NFTs). As NFTs are traded as digital assets, they have emerged as prime targets for cyber attackers. In the development of NFT smart contracts, there may exist undiscovered defects that could lead to substantial financial losses if exploited. To tackle this issue, this paper presents a framework called NATLM(NFT Assistant LLM), designed to detect potential defects in NFT smart contracts. The framework effectively identifies four common types of vulnerabilities in NFT smart contracts: ERC-721 Reentrancy, Public Burn, Risky Mutable Proxy, and Unlimited Minting. Relying exclusively on large language models (LLMs) for defect detection can lead to a high false-positive rate. To enhance detection performance, NATLM integrates static analysis with LLMs, specifically Gemini Pro 1.5. Initially,…
Peer Reviews
Decision·Submitted to ICLR 2026
- Addressing textual defect detection in natural language models fills an important gap between NLP and software engineering quality assurance. - The pipeline, from preprocessing, embedding generation, and feature fusion to classification, is logically structured and comprehensible. - The system is designed to handle different textual artifacts, demonstrating flexibility across requirement documents, bug reports, and natural language code comments. - Reported metrics show consistent improvements
- The architecture largely adapts existing transformer-based techniques with minor task-specific modifications; conceptual contribution is limited. - The paper does not clearly describe dataset sources, labeling criteria, or data quality control, which raises concerns about reproducibility and bias. - No attention visualization or error analysis is provided to explain what types of textual defects are best captured or missed. - Comparisons are limited to standard text classifiers; no ablation ag
1. A feasible solution is proposed to address the existing problems in NFT vulnerability detection. 2. The method used to generate Xcfg (Word2Vec → TextCNN → GCN) is impressive. 3. The authors constructed a specific dataset containing 8,672 NFT smart contracts, which is valuable for future research.
1. The structure of the paper is confusing. The authors placed both their own contributions and the descriptions of existing tools in Chapter 2: THE NATLM FRAMEWORK. A separate chapter should be devoted to explaining the internal mechanisms of standard models such as CodeBERT, GCN, and TextCNN; otherwise, it is difficult for readers to distinguish which parts are the authors’ contributions and which are existing research methods. 2. The necessity of using an LLM is not justified. The authors me
- By integrating a vector database (VecDB) retrieval mechanism with the large language model Gemini, the approach provides contextual information to the LLM during smart contract vulnerability detection, helping mitigate issues related to memory limitations and hallucinated outputs. - The method evaluates a diverse range of vulnerability types, covering high-risk defects commonly found in real-world smart contracts. - The model generates natural language explanations for its predictions, enhanci
- The paper lacks a clear explanation of the fundamental differences in structural features between vulnerable and non-vulnerable smart contracts. Since the differences in feature vectors essentially reflect differences in AST and CFG structures, the authors should provide an analysis of known vulnerability patterns in terms of their CFG and AST characteristics, as well as their execution behaviors. - The dataset used to construct the RAG knowledge base is sourced from **Yang et al. (2023)**, an
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlockchain Technology Applications and Security · Advanced Malware Detection Techniques · Adversarial Robustness in Machine Learning
