Datrics Text2SQL: A Framework for Natural Language to SQL Query Generation
Tetiana Gladkykh, Kyrylo Kirykov

TL;DR
Datrics Text2SQL is a retrieval-augmented framework that improves natural language to SQL query generation by leveraging structured documentation, examples, and domain rules to handle ambiguity and complex schemas.
Contribution
It introduces a novel RAG-based approach that combines knowledge bases with retrieval to enhance SQL generation accuracy without requiring SQL expertise.
Findings
Improved query accuracy over baseline models
Effective handling of ambiguous and complex queries
Demonstrated robustness across different database schemas
Abstract
Text-to-SQL systems enable users to query databases using natural language, democratizing access to data analytics. However, they face challenges in understanding ambiguous phrasing, domain-specific vocabulary, and complex schema relationships. This paper introduces Datrics Text2SQL, a Retrieval-Augmented Generation (RAG)-based framework designed to generate accurate SQL queries by leveraging structured documentation, example-based learning, and domain-specific rules. The system builds a rich Knowledge Base from database documentation and question-query examples, which are stored as vector embeddings and retrieved through semantic similarity. It then uses this context to generate syntactically correct and semantically aligned SQL code. The paper details the architecture, training methodology, and retrieval logic, highlighting how the system bridges the gap between user intent and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Distributed and Parallel Computing Systems · Advanced Database Systems and Queries
