Traceable LLM-based validation of statements in knowledge graphs
Daniel Adam, Tom\'a\v{s} Kliegr

TL;DR
This paper introduces a traceable LLM-based method for validating RDF triples in knowledge graphs by leveraging external document retrieval, achieving high precision but requiring human oversight, and demonstrating potential for large-scale verification.
Contribution
The paper proposes a retrieval-augmented generation workflow that verifies knowledge graph statements without relying on internal LLM knowledge, enhancing traceability and applicability in biosciences.
Findings
Precision of 88% on BioRED dataset
Recall of 44% indicating need for human oversight
Effective on Wikidata for large-scale statement verification
Abstract
This article presents a method for verifying RDF triples using LLMs, with an emphasis on providing traceable arguments. Because the LLMs cannot currently reliably identify the origin of the information used to construct the response to the user prompt, our approach is to avoid using internal LLM factual knowledge altogether. Instead, verified RDF statements are compared to chunks of external documents retrieved through a web search or Wikipedia. To assess the possible application of this retrieval augmented generation (RAG) workflow on biosciences content, we evaluated 1,719 positive statements from the BioRED dataset and the same number of newly generated negative statements. The resulting precision is 88 %, and recall is 44 %. This indicates that the method requires human oversight. We also evaluated the method on the SNLI dataset, which allowed us to compare our approach with models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Semantic Web and Ontologies
