AutoLink: Autonomous Schema Exploration and Expansion for Scalable Schema Linking in Text-to-SQL at Scale
Ziyang Wang, Yuanlei Zheng, Zhenbiao Cao, Xiaojin Zhang, Zhongyu Wei, Pei Fu, Zhenbo Luo, Wei Chen, Xiang Bai

TL;DR
AutoLink introduces an autonomous, iterative agent-driven framework that dynamically explores and expands schema subsets for large-scale text-to-SQL tasks, significantly improving recall and scalability over existing methods.
Contribution
The paper presents AutoLink, a novel autonomous agent-based approach that enhances schema linking by dynamically exploring schemas without full schema input, achieving state-of-the-art recall and scalability.
Findings
Achieves 97.4% recall on Bird-Dev
Attains 91.2% recall on Spider-2.0-Lite
Maintains high performance on large schemas with over 3,000 columns
Abstract
For industrial-scale text-to-SQL, supplying the entire database schema to Large Language Models (LLMs) is impractical due to context window limits and irrelevant noise. Schema linking, which filters the schema to a relevant subset, is therefore critical. However, existing methods incur prohibitive costs, struggle to trade off recall and noise, and scale poorly to large databases. We present \textbf{AutoLink}, an autonomous agent framework that reformulates schema linking as an iterative, agent-driven process. Guided by an LLM, AutoLink dynamically explores and expands the linked schema subset, progressively identifying necessary schema components without inputting the full database schema. Our experiments demonstrate AutoLink's superior performance, achieving state-of-the-art strict schema linking recall of \textbf{97.4\%} on Bird-Dev and \textbf{91.2\%} on Spider-2.0-Lite, with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Quality and Management · Web Data Mining and Analysis
