Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval
Aditya Sharma, Christopher J. Pal, Amal Zouaq

TL;DR
This paper introduces PGMR, a modular framework that reduces hallucinations in LLM-generated SPARQL queries by employing a memory retrieval step to accurately resolve KG URIs, improving correctness and robustness.
Contribution
The paper presents PGMR, a novel approach combining LLMs with a non-parametric memory module to significantly reduce URI hallucinations in SPARQL query generation.
Findings
PGMR greatly improves query correctness across models and datasets.
It nearly eliminates URI hallucinations in generated queries.
The framework maintains performance even with noisy, large memory modules.
Abstract
The ability to generate SPARQL queries from natural language questions is crucial for ensuring efficient and accurate retrieval of structured data from knowledge graphs (KG). While large language models (LLMs) have been widely adopted for SPARQL query generation, they are often susceptible to hallucinations and out-of-distribution errors when generating KG elements, such as Uniform Resource Identifiers (URIs), based on opaque internal parametric knowledge. We propose PGMR (Post-Generation Memory Retrieval), a modular framework where the LLM produces an intermediate query using natural language placeholders for URIs, and a non-parametric memory module is subsequently employed to retrieve and resolve the correct KG URIs. PGMR significantly enhances query correctness (SQM) across various LLMs, datasets, and distribution shifts, while achieving the near-complete suppression of URI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling
