Balancing Content Size in RAG-Text2SQL System

Prakhar Gurawa; Anjali Dharmik

arXiv:2502.15723·cs.IR·March 25, 2025

Balancing Content Size in RAG-Text2SQL System

Prakhar Gurawa, Anjali Dharmik

PDF

Open Access

TL;DR

This paper investigates how document size and quality affect retrieval-augmented generation in Text2SQL systems, identifying optimal thresholds and strategies to reduce hallucinations and improve robustness.

Contribution

It provides a detailed analysis of the trade-offs between document size and quality in RAG-Text2SQL systems, offering strategies to optimize performance and minimize hallucinations.

Findings

01

Optimal document size thresholds identified for best performance.

02

Curated document presentation reduces hallucinations.

03

Strategies proposed to balance content richness and noise.

Abstract

Large Language Models (LLMs) have emerged as a promising solution for converting natural language queries into SQL commands, enabling seamless database interaction. However, these Text-to-SQL (Text2SQL) systems face inherent limitations, hallucinations, outdated knowledge, and untraceable reasoning. To address these challenges, the integration of retrieval-augmented generation (RAG) with Text2SQL models has gained traction. RAG serves as a retrieval mechanism, providing essential contextual information, such as table schemas and metadata, to enhance the query generation process. Despite their potential, RAG + Text2SQL systems are susceptible to the quality and size of retrieved documents. While richer document content can improve schema relevance and retrieval accuracy, it also introduces noise, increasing the risk of hallucinations and reducing query fidelity as the prompt size of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Computational Techniques and Applications · Educational Technology and Assessment · Advanced Text Analysis Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Weight Decay · Linear Layer · Layer Normalization · Byte Pair Encoding · WordPiece · Dense Connections · Attention Dropout · Residual Connection