Data Generation for Testing Complex Queries
Sunanda Somwase, Parismita Das, S. Sudarshan

TL;DR
This paper introduces a new data generation method capable of creating test data for complex SQL queries, improving validation and testing of advanced query systems beyond previous approaches.
Contribution
The paper presents a novel data generation approach specifically designed for complex SQL queries, outperforming existing methods like XData and VeriEQL in validation tasks.
Findings
Effective generation of test data for complex SQL queries.
Outperforms XData on complex query testing.
Outperforms VeriEQL in query non-equivalence detection.
Abstract
Generation of sample data for testing SQL queries has been an important task for many years, with applications such as testing of SQL queries used for data analytics and in application software, as well as student SQL queries. More recently, with the increasing use of text-to-SQL systems, test data is key for the validation of generated queries. Earlier work for test data generation handled basic single block SQL queries, as well as simple nested SQL queries, but could not handle more complex queries. In this paper, we present a novel data generation approach that is designed to handle complex queries, and show its effectiveness on queries for which the earlier XData approach is not as effective. We also show that it can outperform the state-of-the-art VeriEQL system in showing non-equivalence of queries.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Quality and Management · Data Management and Algorithms
