Hallucination Detection for LLM-based Text-to-SQL Generation via Two-Stage Metamorphic Testing

Bo Yang; Yinfen Xia; Weisong Sun; Yang Liu

arXiv:2512.22250·cs.SE·December 30, 2025

Hallucination Detection for LLM-based Text-to-SQL Generation via Two-Stage Metamorphic Testing

Bo Yang, Yinfen Xia, Weisong Sun, Yang Liu

PDF

Open Access

TL;DR

This paper introduces SQLHD, a two-stage metamorphic testing approach for detecting hallucinations in LLM-generated Text-to-SQL queries without needing ground-truth answers, improving detection accuracy.

Contribution

The paper proposes a novel metamorphic testing-based method, SQLHD, for hallucination detection in Text-to-SQL tasks that does not rely on ground-truth data and outperforms existing methods.

Findings

01

Achieves F1-score between 69.36% and 82.76%.

02

Outperforms LLM Self-Evaluation methods in hallucination detection.

03

Effective in identifying hallucinations without ground-truth answers.

Abstract

In Text-to-SQL generation, large language models (LLMs) have shown strong generalization and adaptability. However, LLMs sometimes generate hallucinations, i.e.,unrealistic or illogical content, which leads to incorrect SQL queries and negatively impacts downstream applications. Detecting these hallucinations is particularly challenging. Existing Text-to-SQL error detection methods, which are tailored for traditional deep learning models, face significant limitations when applied to LLMs. This is primarily due to the scarcity of ground-truth data. To address this challenge, we propose SQLHD, a novel hallucination detection method based on metamorphic testing (MT) that does not require standard answers. SQLHD splits the detection task into two sequentiial stages: schema-linking hallucination detection via eight structure-aware Metamorphic Relations (MRs) that perturb comparative words,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Application Security Vulnerabilities · Natural Language Processing Techniques · Topic Modeling