Performance Evaluation of LLMs in Automated RDF Knowledge Graph Generation
Ioana Ramona Martin, Tudor Cioara, Ionut Anghel, Gabriel Arcas

TL;DR
This study systematically evaluates various LLMs and prompting strategies for automating RDF knowledge graph generation from complex cloud logs, demonstrating high accuracy with Few-Shot learning especially using Llama.
Contribution
It introduces a controlled framework and a new Log-to-KG dataset for objective benchmarking of LLMs in RDF extraction from cloud logs.
Findings
Llama achieves 99.35% F1 score and 100% valid RDF output with Few-Shot prompting.
Few-Shot learning outperforms other prompting strategies across multiple LLM architectures.
Prompt design and contextual examples are crucial for accurate RDF extraction.
Abstract
Cloud systems generate large, heterogeneous log data containing critical infrastructure, application, and security information. Transforming these logs into RDF triples enables their integration into knowledge graphs, improving interpretability, root-cause analysis, and cross-service reasoning beyond what raw logs allow. Large Language Models (LLMs) offer a promising approach to automate RDF knowledge graph generation; however, their effectiveness on complex cloud logs remains largely unexplored. In this paper, we evaluate multiple LLM architectures and prompting strategies for automated RDF extraction using a controlled framework with two pipelines for systematically processing semi-structured log data. The extraction pipeline integrates multiple LLMs to identify relevant entities and relationships, automatically generating subject-predicate-object triples. These outputs are evaluated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
