BAPPA: Benchmarking Agents, Plans, and Pipelines for Automated Text-to-SQL Generation

Fahim Ahmed; Md Mubtasim Ahasan; Jahir Sadik Monon; Muntasir Wahed; M Ashraful Amin; A K M Mahbubur Rahman; Amin Ahsan Ali

arXiv:2511.04153·cs.CL·November 7, 2025

BAPPA: Benchmarking Agents, Plans, and Pipelines for Automated Text-to-SQL Generation

Fahim Ahmed, Md Mubtasim Ahasan, Jahir Sadik Monon, Muntasir Wahed, M Ashraful Amin, A K M Mahbubur Rahman, Amin Ahsan Ali

PDF

Open Access

TL;DR

This paper benchmarks and compares multi-agent LLM pipelines for Text-to-SQL tasks, demonstrating improvements in accuracy across various models and proposing effective multi-agent strategies.

Contribution

It introduces and systematically evaluates three novel multi-agent LLM pipelines for Text-to-SQL, highlighting their effectiveness over existing methods.

Findings

01

Multi-agent discussion improves small model performance.

02

The Reasoner-Coder pipeline achieves the highest accuracy.

03

Up to 10.6% accuracy increase with multi-round discussions.

Abstract

Text-to-SQL systems provide a natural language interface that can enable even laymen to access information stored in databases. However, existing Large Language Models (LLM) struggle with SQL generation from natural instructions due to large schema sizes and complex reasoning. Prior work often focuses on complex, somewhat impractical pipelines using flagship models, while smaller, efficient models remain overlooked. In this work, we explore three multi-agent LLM pipelines, with systematic performance benchmarking across a range of small to large open-source models: (1) Multi-agent discussion pipeline, where agents iteratively critique and refine SQL queries, and a judge synthesizes the final answer; (2) Planner-Coder pipeline, where a thinking model planner generates stepwise SQL generation plans and a coder synthesizes queries; and (3) Coder-Aggregator pipeline, where multiple coders…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling