Intent2Tx: Benchmarking LLMs for Translating Natural Language Intents into Ethereum Transactions

Zhuoran Pan; Yue Li; Zhi Guan; Jianbin Hu; Zhong Chen

arXiv:2604.27763·cs.AI·May 1, 2026

Intent2Tx: Benchmarking LLMs for Translating Natural Language Intents into Ethereum Transactions

Zhuoran Pan, Yue Li, Zhi Guan, Jianbin Hu, Zhong Chen

PDF

1 Repo

TL;DR

Intent2Tx introduces a comprehensive benchmark for evaluating how well large language models translate natural language intents into Ethereum blockchain transactions, emphasizing real-world complexity and execution accuracy.

Contribution

It provides a large, real-world, execution-aware benchmark and evaluation framework for assessing LLMs' ability to generate correct on-chain transactions from natural language.

Findings

01

Scaling and retrieval-augmentation improve logical consistency.

02

Models struggle with out-of-distribution generalization.

03

Syntactically valid outputs often fail to produce intended state changes.

Abstract

The emergence of Large Language Models (LLMs) offers a transformative interface for Web3, yet existing benchmarks fail to capture the complexity of translating high-level user intents into functionally correct, state-dependent on-chain transactions. We present \textsc{Intent2Tx}, a high-fidelity benchmark featuring 29,921 single-step and 1,575 multi-step instances meticulously derived from 300 days of real-world Ethereum mainnet traces. Unlike prior works that rely on synthetic instructions, \textsc{Intent2Tx} grounds natural language intents in real-world protocol interactions across 11 categories, including diverse long-tail Decentralized Finance (DeFi) primitives. To enable rigorous evaluation, we propose an execution-aware framework that transcends surface-level text matching by employing differential state analysis on forked mainnet environments. Our extensive evaluation of 16…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://anonymous.4open.science/r/Intent2Tx_Bench-97FF
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.