SPARQL Query Generation with LLMs: Measuring the Impact of Training Data Memorization and Knowledge Injection

Aleksandr Gashkov; Aleksandr Perevalov; Maria Eltsova; Andreas Both

arXiv:2507.13859·cs.IR·July 21, 2025

SPARQL Query Generation with LLMs: Measuring the Impact of Training Data Memorization and Knowledge Injection

Aleksandr Gashkov, Aleksandr Perevalov, Maria Eltsova, Andreas Both

PDF

Open Access

TL;DR

This paper introduces a method to evaluate how training data influences LLM performance in generating SPARQL queries for QA over knowledge graphs, addressing issues of memorization and knowledge injection.

Contribution

It proposes a novel evaluation approach for LLMs in KGQA that isolates training data effects, including knowledge injection, across various conditions.

Findings

01

Assessing training data influence on LLM performance

02

Identifying memorization effects in benchmark datasets

03

Providing a portable, robust evaluation method

Abstract

Nowadays, the importance of software with natural-language user interfaces cannot be underestimated. In particular, in Question Answering (QA) systems, generating a SPARQL query for a given natural-language question (often named Query Building) from the information retrieved from the same question is the central task of QA systems working over Knowledge Graphs (KGQA). Due to the rise of Large Language Models (LLMs), they are considered a well-suited method to increase the quality of the question-answering functionality, as there is still a lot of room for improvement, aiming for enhanced quality and trustworthiness. However, LLMs are trained on web data, where researchers have no control over whether the benchmark or the knowledge graph was already included in the training data. In this paper, we introduce a novel method that evaluates the quality of LLMs by generating a SPARQL query…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Topic Modeling · Multimodal Machine Learning Applications