Evaluating the Text-to-SQL Capabilities of Large Language Models

Nitarshan Rajkumar; Raymond Li; Dzmitry Bahdanau

arXiv:2204.00498·cs.CL·April 4, 2022·51 cites

Evaluating the Text-to-SQL Capabilities of Large Language Models

Nitarshan Rajkumar, Raymond Li, Dzmitry Bahdanau

PDF

Open Access

TL;DR

This paper empirically evaluates the Text-to-SQL abilities of the Codex language model, showing strong zero-shot performance and improvements with few-shot prompting across multiple benchmarks.

Contribution

It demonstrates that Codex, without fine-tuning, is a competitive baseline for Text-to-SQL tasks and highlights the effectiveness of few-shot prompting in improving performance.

Findings

01

Codex performs well on the Spider benchmark without fine-tuning.

02

Few-shot prompting with in-domain examples enhances Codex's performance on GeoQuery and Scholar.

03

Analysis of failure modes provides insights into limitations of Codex in Text-to-SQL tasks.

Abstract

We perform an empirical evaluation of Text-to-SQL capabilities of the Codex language model. We find that, without any finetuning, Codex is a strong baseline on the Spider benchmark; we also analyze the failure modes of Codex in this setting. Furthermore, we demonstrate on the GeoQuery and Scholar benchmarks that a small number of in-domain examples provided in the prompt enables Codex to perform better than state-of-the-art models finetuned on such few-shot examples.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Semantic Web and Ontologies · Natural Language Processing Techniques