SQLAgent: Learning to Explore Before Generating as a Data Engineer

Wenjia Jiang; Yiwei Wang; Boyan Han; Joey Tianyi Zhou; Chi Zhang

arXiv:2602.01952·cs.DB·February 3, 2026

SQLAgent: Learning to Explore Before Generating as a Data Engineer

Wenjia Jiang, Yiwei Wang, Boyan Han, Joey Tianyi Zhou, Chi Zhang

PDF

Open Access 3 Reviews

TL;DR

SQLAgent introduces a two-stage framework where an exploration phase builds a database-specific knowledge base, enabling more accurate SQL query generation in complex, real-world database scenarios.

Contribution

The paper presents a novel decoupled two-stage LLM-based approach that enhances generalization and accuracy in SQL query generation by proactive schema exploration and knowledge collection.

Findings

01

Significant accuracy improvements over baselines on large-scale benchmarks.

02

Effective handling of complex, multi-step reasoning in SQL queries.

03

Proactive schema exploration enhances generalization to unseen databases.

Abstract

Large Language Models have recently shown impressive capabilities in reasoning and code generation, making them promising tools for natural language interfaces to relational databases. However, existing approaches often fail to generalize in complex, real-world settings due to the highly database-specific nature of SQL reasoning, which requires deep familiarity with unique schemas, ambiguous semantics, and intricate join paths. To address this challenge, we introduce a novel two-stage LLM-based framework that decouples knowledge acquisition from query generation. In the Exploration Stage, the system autonomously constructs a database-specific knowledge base by navigating the schema with a Monte Carlo Tree Search-inspired strategy, generating triplets of schema fragments, executable queries, and natural language descriptions as usage examples. In the Deployment Stage, a dual-agent system…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 4Confidence 4

Strengths

- Well-engineered framework. The two-stage design is clearly explained and systematically evaluated. - Strong empirical results. The method achieves higher execution accuracy with detailed ablation and LLM-backbone analysis. - Good implementation quality. Integration with LangGraph, Neo4j, and FAISS shows solid engineering and reproducibility.

Weaknesses

- Limited conceptual novelty. The “exploration-before-generation” paradigm is well established in prior agentic LLM work (e.g., ReAct, Reflexion, AlphaCode). The contribution mainly adapts this idea to Text-to-SQL. - Incremental adaptation. The MCTS-style schema exploration and dual-agent refinement are logical extensions rather than fundamentally new ideas. - Missing qualitative insights. The paper does not show concrete examples illustrating how exploration improves generation. - Incomplete ba

Reviewer 02Rating 4Confidence 4

Strengths

1. Two-stage design mimics how humans learn databases (explore first, then use) 2. Triplets as executable knowledge vs. static schema descriptions is clever 3. Ablation shows both stages contribute (+5.8 pts for exploration, +5.7 pts for dual-agent). Consistent gains across different LLM backbones (GPT-4o, Claude, Qwen)

Weaknesses

1. Weak baselines The paper compares against only 2 baselines: (1) ReFoRCE (20.84%) - one concurrent work (2) "Spider-Agent" (12.98%) - a bit naive Spider 2.0 has a public leaderboard at spider2-sql.github.io. ReFoRCE ranks #8/10 and doesn't seem to be the state-of-the-art. Comparing only against it is misleading. The paper claims triplets discover "business logic". But: Standard database profiling tools (Great Expectations, dbt) can extract Unique values per column, Cardinality & distribution

Reviewer 03Rating 2Confidence 4

Strengths

S1. Motivation: The paper tackles a real pain point—database‑specific generalization in NL2SQL—by trying to separate knowledge acquisition from generation. S2. Systematization attempt: The exploration stage aims to build a database‑specific repository of examples that can help downstream generation, which is a reasonable engineering idea even if the theory is thin. S3. Readable pipeline: The dual‑agent design (InfoAgent/GenAgent) and its runtime loop (Figure 2) are clearly diagrammed, aiding r

Weaknesses

The first set of weaknesses (W1-W5) comes form informal and incorrect database concepts and schema modeling. W1. Oversimplified/ambiguous “schema” notion: Figure 1 describes a schema as “a logical container for a collection of tables,” but the paper never formalizes essential relational concepts (keys, foreign keys, constraints, views, indices) that are indispensable for reasoning about joins and correctness. This under‑specification leaks into the rest of the method (e.g., join selection and g

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Topic Modeling