ProVe: A Pipeline for Automated Provenance Verification of Knowledge   Graphs against Textual Sources

Gabriel Amaral; Odinaldo Rodrigues; Elena Simperl

arXiv:2210.14846·cs.CL·October 27, 2022

ProVe: A Pipeline for Automated Provenance Verification of Knowledge Graphs against Textual Sources

Gabriel Amaral, Odinaldo Rodrigues, Elena Simperl

PDF

Open Access

TL;DR

ProVe is an automated pipeline that verifies the support of Knowledge Graph triples by textual provenance, improving trustworthiness and scalability of provenance verification in knowledge graphs.

Contribution

It introduces a novel, automated, multi-step pipeline combining rule-based and machine learning methods for provenance verification of Knowledge Graph triples.

Findings

01

Achieved 87.5% accuracy in support detection

02

High F1-macro score of 82.9% on text-rich sources

03

Demonstrated effectiveness on Wikidata dataset

Abstract

Knowledge Graphs are repositories of information that gather data from a multitude of domains and sources in the form of semantic triples, serving as a source of structured data for various crucial applications in the modern web landscape, from Wikipedia infoboxes to search engines. Such graphs mainly serve as secondary sources of information and depend on well-documented and verifiable provenance to ensure their trustworthiness and usability. However, their ability to systematically assess and assure the quality of this provenance, most crucially whether it properly supports the graph's information, relies mainly on manual processes that do not scale with size. ProVe aims at remedying this, consisting of a pipelined approach that automatically verifies whether a Knowledge Graph triple is supported by text extracted from its documented provenance. ProVe is intended to assist information…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Topic Modeling · Scientific Computing and Data Management