Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and   Schema Pruning

Zhili Shen; Pavlos Vougiouklis; Chenxin Diao; Kaustubh Vyas; and Yuanyi Ji; Jeff Z. Pan

arXiv:2407.03227·cs.CL·November 5, 2024·1 cites

Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning

Zhili Shen, Pavlos Vougiouklis, Chenxin Diao, Kaustubh Vyas, and Yuanyi Ji, Jeff Z. Pan

PDF

Open Access 1 Video

TL;DR

This paper introduces ASTReS, a retrieval-augmented approach for Text-to-SQL parsing that uses AST-based ranking and schema pruning to improve performance on monolingual and cross-lingual benchmarks.

Contribution

The paper presents ASTReS, a novel method combining dynamic retrieval, AST-based example selection, and a lightweight semantic parser for efficient, improved Text-to-SQL translation.

Findings

01

ASTReS outperforms state-of-the-art baselines on multiple benchmarks.

02

Schema pruning and AST-based ranking significantly enhance retrieval accuracy.

03

A small, efficient model effectively supports parallel schema processing.

Abstract

We focus on Text-to-SQL semantic parsing from the perspective of retrieval-augmented generation. Motivated by challenges related to the size of commercial database schemata and the deployability of business intelligence solutions, we propose $ASTReS$ that dynamically retrieves input database information and uses abstract syntax trees to select few-shot examples for in-context learning. Furthermore, we investigate the extent to which an in-parallel semantic parser can be leveraged for generating approximated versions of the expected SQL queries, to support our retrieval. We take this approach to the extreme--we adapt a model consisting of less than $500$ M parameters, to act as an extremely efficient approximator, enhancing it with the ability to process schemata in a parallelised manner. We apply $ASTReS$ to monolingual and cross-lingual benchmarks for semantic parsing,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning· underline

Taxonomy

TopicsAdvanced Database Systems and Queries · Data Mining Algorithms and Applications · Educational Technology and Assessment

MethodsFocus