Non-Parametric Class Completeness Estimators for Collaborative Knowledge Graphs -- The Case of Wikidata
Michael Luggen, Djellel Difallah, Cristina Sarasua, Gianluca, Demartini, and Philippe Cudr\'e-Mauroux

TL;DR
This paper introduces statistical, non-parametric methods to estimate the completeness of classes in collaborative knowledge graphs like Wikidata, addressing challenges of size, dynamism, and user activity.
Contribution
It develops novel class cardinality estimators based on species estimation techniques, tailored for large, dynamic, collaborative knowledge graphs.
Findings
Estimator performance is heavily influenced by the number and frequency of unique class instances.
Bursts of data insertions can lead to overestimation if not properly managed.
Estimator stability can be used to measure convergence towards true class size.
Abstract
Collaborative Knowledge Graph platforms allow humans and automated scripts to collaborate in creating, updating and interlinking entities and facts. To ensure both the completeness of the data as well as a uniform coverage of the different topics, it is crucial to identify underrepresented classes in the Knowledge Graph. In this paper, we tackle this problem by developing statistical techniques for class cardinality estimation in collaborative Knowledge Graph platforms. Our method is able to estimate the completeness of a class - as defined by a schema or ontology - hence can be used to answer questions such as "Does the knowledge base have a complete list of all {Beer Brands|Volcanos|Video Game Consoles}?" As a use-case, we focus on Wikidata, which poses unique challenges in terms of the size of its ontology, the number of users actively populating its graph, and its extremely dynamic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
