Dialect-Agnostic SQL Parsing via LLM-Based Segmentation
Junwen An, Kabilan Mahathevan, Manuel Rigger

TL;DR
SQLFlex is a novel framework that combines grammar-based parsing with LLM segmentation to robustly parse diverse SQL dialects, overcoming hierarchical and hallucination limitations of LLMs.
Contribution
It introduces a hierarchical decomposition approach using clause and expression segmentation, enhancing LLM-based SQL parsing across multiple dialects.
Findings
Outperforms SQLFluff in SQL linting F1 score by 63.68%.
Achieves up to 10 times better simplification rate than SQLess.
Parses 91.55% to 100% of queries across eight SQL dialects.
Abstract
SQL is a widely adopted language for querying data, which has led to the development of various SQL analysis and rewriting tools. However, due to the diversity of SQL dialects, such tools often fail when encountering unrecognized dialect-specific syntax. While Large Language Models (LLMs) have shown promise in understanding SQL queries, their inherent limitations in handling hierarchical structures and hallucination risks limit their direct applicability in parsing. To address these limitations, we propose SQLFlex, a novel query rewriting framework that integrates grammar-based parsing with LLM-based segmentation to parse diverse SQL dialects robustly. Our core idea is to decompose hierarchical parsing to sequential segmentation tasks, which better aligns with the strength of LLMs and improves output reliability through validation checks. Specifically, SQLFlex uses clause-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Web Application Security Vulnerabilities · Advanced Database Systems and Queries
