Business Logic-Driven Text-to-SQL Data Synthesis for Business Intelligence
Jinhui Liu, Ximeng Zhang, Yanbo Ai, Zhou Yu

TL;DR
This paper introduces a business logic-driven data synthesis framework for Text-to-SQL tasks, generating realistic, complex business questions grounded in workflows to improve evaluation and reveal model limitations.
Contribution
It presents a novel data synthesis method that incorporates business personas and reasoning complexity, significantly enhancing data realism for Text-to-SQL evaluation.
Findings
Synthesized data achieves 98.44% business realism
Outperforms existing datasets by 19.5% and 54.7% in realism
State-of-the-art models reach only 42.86% accuracy on complex queries
Abstract
Evaluating Text-to-SQL agents in private business intelligence (BI) settings is challenging due to the scarcity of realistic, domain-specific data. While synthetic evaluation data offers a scalable solution, existing generation methods fail to capture business realism--whether questions reflect realistic business logic and workflows. We propose a Business Logic-Driven Data Synthesis framework that generates data grounded in business personas, work scenarios, and workflows. In addition, we improve the data quality by imposing a business reasoning complexity control strategy that diversifies the analytical reasoning steps required to answer the questions. Experiments on a production-scale Salesforce database show that our synthesized data achieves high business realism (98.44%), substantially outperforming OmniSQL (+19.5%) and SQL-Factory (+54.7%), while maintaining strong question-SQL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersona Design and Applications · Business Process Modeling and Analysis · Data Quality and Management
