Prefix-to-SQL: Text-to-SQL Generation from Incomplete User Questions
Naihao Deng, Shuaichen Chang, Peng Shi, Tao Yu, Rui Zhang

TL;DR
This paper introduces the prefix-to-SQL task, enabling natural language interfaces to handle incomplete user questions by predicting SQL queries, and presents a new benchmark and metric to evaluate this capability.
Contribution
It proposes the prefix-to-SQL task, creates the PAGSAS benchmark with 124K examples, and introduces the SAVE metric to measure user effort reduction.
Findings
PAGSAS is challenging for strong models like T5.
Curriculum learning improves recall scores by up to 9%.
Difficulty correlates with omitted tokens in user questions.
Abstract
Existing text-to-SQL research only considers complete questions as the input, but lay-users might strive to formulate a complete question. To build a smarter natural language interface to database systems (NLIDB) that also processes incomplete questions, we propose a new task, prefix-to-SQL which takes question prefix from users as the input and predicts the intended SQL. We construct a new benchmark called PAGSAS that contains 124K user question prefixes and the intended SQL for 5 sub-tasks Advising, GeoQuery, Scholar, ATIS, and Spider. Additionally, we propose a new metric SAVE to measure how much effort can be saved by users. Experimental results show that PAGSAS is challenging even for strong baseline models such as T5. As we observe the difficulty of prefix-to-SQL is related to the number of omitted tokens, we incorporate curriculum learning of feeding examples with an increasing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Web Data Mining and Analysis
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Dense Connections · Attention Dropout · Byte Pair Encoding · Gated Linear Unit · Residual Connection · Softmax · Inverse Square Root Schedule
