LASER: A Data-Centric Method for Low-Cost and Efficient SQL Rewriting based on SQL-GRPO
Jiahui Li, Tongwang Wu, Yuren Mao, Rong Kang, Tieying Zhang, Yunjun Gao

TL;DR
LASER introduces a data-centric framework that enables small language models to perform efficient SQL query rewriting, significantly improving execution performance with minimal overhead.
Contribution
The paper develops SQL-GRPO and constructs SQL-MCTS to train small models for robust, execution-aware SQL optimization, addressing data scarcity and efficiency challenges.
Findings
LASER outperforms rule-based and LLM approaches in execution efficiency.
Constructed SQL-MCTS with a hybrid expansion strategy for complex slow queries.
LASER demonstrates strong zero-shot transferability with minimal overhead.
Abstract
Query rewriting, the process of transforming queries into semantically equivalent yet more efficient variants, is crucial for database optimization. Existing solutions predominantly rely on either rule-based heuristics or Large Language Models (LLMs). However, traditional rule-based methods lack adaptability, while LLM-based approaches incur prohibitive inference costs and privacy risks. In contrast, Small Language Models (SLMs) present a compelling middle ground, potentially offering both flexibility and efficiency. However, the development of such compact models is severely bottlenecked by the scarcity of high-quality, domain-specific training data. To bridge this gap, we introduce LASER, a data-centric framework designed to empower small models for robust SQL optimization. First, to address the scarcity of existing benchmarks and the limited optimization headroom of generic synthetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
