End-to-end Text-to-SQL Generation within an Analytics Insight Engine
Karime Maamari, Amine Mhedhbi

TL;DR
This paper presents an end-to-end Text-to-SQL generation system within an analytics engine that addresses high complexity, low latency, and domain-specific understanding challenges using large language models, external knowledge, and feedback loops.
Contribution
It introduces a hierarchical CTE-based SQL generation pipeline with external knowledge integration and an adaptation framework for continuous improvement.
Findings
Effective handling of complex SQL queries
Low latency query generation suitable for ad-hoc requests
Improved accuracy through external knowledge updates
Abstract
Recent advancements in Text-to-SQL have pushed database management systems towards greater democratization of data access. Today's language models are at the core of these advancements. They enable impressive Text-to-SQL generation as experienced in the development of Distyl AI's Analytics Insight Engine. Its early deployment with enterprise customers has highlighted three core challenges. First, data analysts expect support with authoring SQL queries of very high complexity. Second, requests are ad-hoc and, as such, require low latency. Finally, generation requires an understanding of domain-specific terminology and practices. The design and implementation of our Text-to-SQL generation pipeline, powered by large language models, tackles these challenges. The core tenants of our approach rely on external knowledge that we extract in a pre-processing phase, on retrieving the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management
