TL;DR
The paper introduces LLM-KG-Bench 3.0, a framework for evaluating large language models' capabilities in semantic web and knowledge graph tasks, providing a comprehensive dataset and comparison of state-of-the-art models.
Contribution
It presents an enhanced, extensible evaluation framework and a large dataset for assessing LLMs' performance in semantic web and knowledge graph engineering tasks.
Findings
Significant improvements in evaluation flexibility and support for open models.
Generated dataset includes answers from over 30 LLMs on RDF and SPARQL tasks.
Model comparisons highlight varying strengths in semantic technology capabilities.
Abstract
Current Large Language Models (LLMs) can assist developing program code beside many other things, but can they support working with Knowledge Graphs (KGs) as well? Which LLM is offering the best capabilities in the field of Semantic Web and Knowledge Graph Engineering (KGE)? Is this possible to determine without checking many answers manually? The LLM-KG-Bench framework in Version 3.0 is designed to answer these questions. It consists of an extensible set of tasks for automated evaluation of LLM answers and covers different aspects of working with semantic technologies. In this paper the LLM-KG-Bench framework is presented in Version 3 along with a dataset of prompts, answers and evaluations generated with it and several state-of-the-art LLMs. Significant enhancements have been made to the framework since its initial release, including an updated task API that offers greater flexibility…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training
