Benchmarking Text-to-Python against Text-to-SQL: The Impact of Explicit Logic and Ambiguity
Hangle Hu, Chenyu Hou, Bin Cao, Ruizhe Li

TL;DR
This paper introduces BIRD-Python, a benchmark for evaluating Text-to-Python systems, analyzes the core differences with Text-to-SQL, and proposes a framework to handle ambiguity, showing that with proper grounding, Python can match SQL performance.
Contribution
The paper presents BIRD-Python, a standardized benchmark for Text-to-Python, and introduces the Logic Completion Framework to address ambiguity, enabling performance parity with Text-to-SQL.
Findings
Performance gaps are mainly due to missing domain context.
Addressing ambiguity allows Text-to-Python to match Text-to-SQL performance.
Explicit grounding of natural language improves code generation accuracy.
Abstract
While Text-to-SQL remains the dominant approach for database interaction, real-world analytics increasingly require the flexibility of general-purpose programming languages such as Python or Pandas to manage file-based data and complex analytical workflows. Despite this growing need, the reliability of Text-to-Python in core data retrieval remains underexplored relative to the mature SQL ecosystem. To address this gap, we introduce BIRD-Python, a benchmark designed for cross-paradigm evaluation. We systematically refined the original dataset to reduce annotation noise and align execution semantics, thereby establishing a consistent and standardized baseline for comparison. Our analysis reveals a fundamental paradigmatic divergence: whereas SQL leverages implicit DBMS behaviors through its declarative structure, Python requires explicit procedural logic, making it highly sensitive to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Scientific Computing and Data Management · Digital Humanities and Scholarship
