SQLBarber: A System Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads
Jiale Lao, Immanuel Trummer

TL;DR
SQLBarber is a system that uses large language models to generate realistic, customizable SQL workloads based on natural language specifications, addressing privacy and customization limitations in existing methods.
Contribution
It introduces a novel LLM-based pipeline with self-correction and Bayesian optimization for generating tailored SQL queries reflecting real-world characteristics.
Findings
Reduces query generation time by 1-3 orders of magnitude.
Achieves better alignment with target cost distributions.
Successfully generates customizable SQL templates.
Abstract
Database research and development often require a large number of SQL queries for benchmarking purposes. However, acquiring real-world SQL queries is challenging due to privacy concerns, and existing SQL generation methods are limited in customization and in satisfying realistic constraints. To address this issue, we present SQLBarber, a system based on Large Language Models (LLMs) to generate customized and realistic SQL workloads. SQLBarber (i) eliminates the need for users to manually craft SQL templates in advance, while providing the flexibility to accept natural language specifications to constrain SQL templates, (ii) scales efficiently to generate large volumes of queries matching any user-defined cost distribution (e.g., cardinality and execution plan cost), and (iii) uses execution statistics from Amazon Redshift and Snowflake to derive SQL template specifications and query…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Scientific Computing and Data Management · Data Quality and Management
MethodsSparse Evolutionary Training
