Assessing SPARQL capabilities of Large Language Models
Lars-Peter Meyer, Johannes Frey, Felix Brei, Natanael Arndt

TL;DR
This paper evaluates the ability of large language models to understand and generate SPARQL SELECT queries for knowledge graphs, revealing current limitations and dependencies on model type and query complexity.
Contribution
Introduces a new benchmarking framework for assessing LLMs' capabilities with SPARQL, providing systematic evaluation across syntax and semantics.
Findings
Syntax errors are easily fixed by LLMs.
Semantic correctness in SPARQL queries remains challenging.
Performance varies significantly across models and query complexity.
Abstract
The integration of Large Language Models (LLMs) with Knowledge Graphs (KGs) offers significant synergistic potential for knowledge-driven applications. One possible integration is the interpretation and generation of formal languages, such as those used in the Semantic Web, with SPARQL being a core technology for accessing KGs. In this paper, we focus on measuring out-of-the box capabilities of LLMs to work with SPARQL and more specifically with SPARQL SELECT queries applying a quantitative approach. We implemented various benchmarking tasks in the LLM-KG-Bench framework for automated execution and evaluation with several LLMs. The tasks assess capabilities along the dimensions of syntax, semantic read, semantic create, and the role of knowledge graph prompt inclusion. With this new benchmarking tasks, we evaluated a selection of GPT, Gemini, and Claude models. Our findings indicate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Residual Connection · Attention Dropout · Linear Layer · Discriminative Fine-Tuning · Multi-Head Attention · Cosine Annealing · Linear Warmup With Cosine Annealing · Byte Pair Encoding
