AI-Assisted SQL Authoring at Industry Scale
Chandra Maddila, Negar Ghorbani, Kosay Jabre, Vijayaraghavan Murali,, Edwin Kim, Parth Thakkar, Nikolay Pavlovich Laptev, Olivia Harman, Diana Hsu,, Rui Abreu, Peter C. Rigby

TL;DR
This paper introduces SqlCompose, an AI system for SQL authoring at industry scale, demonstrating significant improvements over baseline models and practical deployment at Meta with widespread user adoption.
Contribution
The paper develops specialized models for SQL generation, including fill-in-the-middle techniques, and demonstrates their effectiveness and deployment at Meta, outperforming larger general models.
Findings
SqlComposeFIM outperforms baseline models by 35 percentage points in BLEU score.
The system achieves 75% accuracy in correct table name prediction.
Over 10,000 users at Meta use SqlCompose weekly, with less than 1% opting out.
Abstract
SqlCompose brings generative AI into the data analytics domain. SQL is declarative, has formal table schemas, and is often written in a non-linear manner. We address each of these challenges and develop a set of models that shows the importance of each problem. We first develop an internal SQL benchmark to perform offline tests at Meta. We evaluate how well the Public Llama model performs. We attain a BLEU score of 53% and 24% for single- and multi-line predictions, respectively. This performance is consistent with prior works on imperative languages. We then fine-tune Llama on our internal data and database schemas. SqlComposeSA substantially outperforms Llama by 16 percentage points on BLEU score. SQL is often written with multiple sub queries and in a non-sequential manner. We develop SqlComposeFIM which is aware of the context before and after the line(s) that need to be completed.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Big Data and Business Intelligence · Data Mining Algorithms and Applications
