Hallucination Detection for LLM-based Text-to-SQL Generation via Two-Stage Metamorphic Testing
Bo Yang, Yinfen Xia, Weisong Sun, Yang Liu

TL;DR
This paper introduces SQLHD, a two-stage metamorphic testing approach for detecting hallucinations in LLM-generated Text-to-SQL queries without needing ground-truth answers, improving detection accuracy.
Contribution
The paper proposes a novel metamorphic testing-based method, SQLHD, for hallucination detection in Text-to-SQL tasks that does not rely on ground-truth data and outperforms existing methods.
Findings
Achieves F1-score between 69.36% and 82.76%.
Outperforms LLM Self-Evaluation methods in hallucination detection.
Effective in identifying hallucinations without ground-truth answers.
Abstract
In Text-to-SQL generation, large language models (LLMs) have shown strong generalization and adaptability. However, LLMs sometimes generate hallucinations, i.e.,unrealistic or illogical content, which leads to incorrect SQL queries and negatively impacts downstream applications. Detecting these hallucinations is particularly challenging. Existing Text-to-SQL error detection methods, which are tailored for traditional deep learning models, face significant limitations when applied to LLMs. This is primarily due to the scarcity of ground-truth data. To address this challenge, we propose SQLHD, a novel hallucination detection method based on metamorphic testing (MT) that does not require standard answers. SQLHD splits the detection task into two sequentiial stages: schema-linking hallucination detection via eight structure-aware Metamorphic Relations (MRs) that perturb comparative words,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Application Security Vulnerabilities · Natural Language Processing Techniques · Topic Modeling
