LLM-Based Test Case Generation in DBMS through Monte Carlo Tree Search
Yujia Chen, Yingli Zhou, Fangyuan Zhang, Cuiyun Gao

TL;DR
This paper introduces MIST, a novel framework combining LLMs with Monte Carlo Tree Search to generate diverse, syntactically valid SQL test cases for DBMS testing, significantly improving code coverage.
Contribution
MIST is the first framework to integrate feature-guided synthetization and MCTS-based mutation for LLM-driven SQL test generation in DBMS testing.
Findings
Achieved 43.3% increase in line coverage
Improved branch coverage by 46.4%
Enhanced SQL query diversity and validity
Abstract
Database Management Systems (DBMSs) are fundamental infrastructure for modern data-driven applications, where thorough testing with high-quality SQL test cases is essential for ensuring system reliability. Traditional approaches such as fuzzing can be effective for specific DBMSs, but adapting them to different proprietary dialects requires substantial manual effort. Large Language Models (LLMs) present promising opportunities for automated SQL test generation, but face critical challenges in industrial environments. First, lightweight models are widely used in organizations due to security and privacy constraints, but they struggle to generate syntactically valid queries for proprietary SQL dialects. Second, LLM-generated queries are often semantically similar and exercise only shallow execution paths, thereby quickly reaching a coverage plateau. To address these challenges, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Advanced Database Systems and Queries · Software System Performance and Reliability
