LLM-KG-Bench 3.0: A Compass for SemanticTechnology Capabilities in the Ocean of LLMs

Lars-Peter Meyer; Johannes Frey; Desiree Heim; Felix Brei; Claus Stadler; Kurt Junghanns; Michael Martin

arXiv:2505.13098·cs.AI·June 3, 2025

LLM-KG-Bench 3.0: A Compass for SemanticTechnology Capabilities in the Ocean of LLMs

Lars-Peter Meyer, Johannes Frey, Desiree Heim, Felix Brei, Claus Stadler, Kurt Junghanns, Michael Martin

PDF

1 Repo

TL;DR

The paper introduces LLM-KG-Bench 3.0, a framework for evaluating large language models' capabilities in semantic web and knowledge graph tasks, providing a comprehensive dataset and comparison of state-of-the-art models.

Contribution

It presents an enhanced, extensible evaluation framework and a large dataset for assessing LLMs' performance in semantic web and knowledge graph engineering tasks.

Findings

01

Significant improvements in evaluation flexibility and support for open models.

02

Generated dataset includes answers from over 30 LLMs on RDF and SPARQL tasks.

03

Model comparisons highlight varying strengths in semantic technology capabilities.

Abstract

Current Large Language Models (LLMs) can assist developing program code beside many other things, but can they support working with Knowledge Graphs (KGs) as well? Which LLM is offering the best capabilities in the field of Semantic Web and Knowledge Graph Engineering (KGE)? Is this possible to determine without checking many answers manually? The LLM-KG-Bench framework in Version 3.0 is designed to answer these questions. It consists of an extensible set of tasks for automated evaluation of LLM answers and covers different aspects of working with semantic technologies. In this paper the LLM-KG-Bench framework is presented in Version 3 along with a dataset of prompts, answers and evaluations generated with it and several state-of-the-art LLMs. Significant enhancements have been made to the framework since its initial release, including an updated task API that offers greater flexibility…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aksw/llm-kg-bench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training