ProVe: A Pipeline for Automated Provenance Verification of Knowledge Graphs against Textual Sources
Gabriel Amaral, Odinaldo Rodrigues, Elena Simperl

TL;DR
ProVe is an automated pipeline that verifies the support of Knowledge Graph triples by textual provenance, improving trustworthiness and scalability of provenance verification in knowledge graphs.
Contribution
It introduces a novel, automated, multi-step pipeline combining rule-based and machine learning methods for provenance verification of Knowledge Graph triples.
Findings
Achieved 87.5% accuracy in support detection
High F1-macro score of 82.9% on text-rich sources
Demonstrated effectiveness on Wikidata dataset
Abstract
Knowledge Graphs are repositories of information that gather data from a multitude of domains and sources in the form of semantic triples, serving as a source of structured data for various crucial applications in the modern web landscape, from Wikipedia infoboxes to search engines. Such graphs mainly serve as secondary sources of information and depend on well-documented and verifiable provenance to ensure their trustworthiness and usability. However, their ability to systematically assess and assure the quality of this provenance, most crucially whether it properly supports the graph's information, relies mainly on manual processes that do not scale with size. ProVe aims at remedying this, consisting of a pipelined approach that automatically verifies whether a Knowledge Graph triple is supported by text extracted from its documented provenance. ProVe is intended to assist information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Scientific Computing and Data Management
