Re-thinking Knowledge Graph Completion Evaluation from an Information Retrieval Perspective
Ying Zhou, Xuanang Chen, Ben He, Zheng Ye, Le Sun

TL;DR
This paper critically examines the current entity ranking evaluation protocol for knowledge graph completion, revealing its sensitivity to label sparsity and proposing IR-inspired evaluation methods for more reliable system comparison.
Contribution
It introduces a TREC-style pooling approach to create more complete labels for KGC evaluation and compares macro and micro metrics, highlighting the advantages of macro metrics under label sparsity.
Findings
Switching to complete labels drastically changes system rankings.
Macro metrics are more stable and discriminative than micro metrics.
TREC-style pooling balances label completeness and human effort.
Abstract
Knowledge graph completion (KGC) aims to infer missing knowledge triples based on known facts in a knowledge graph. Current KGC research mostly follows an entity ranking protocol, wherein the effectiveness is measured by the predicted rank of a masked entity in a test triple. The overall performance is then given by a micro(-average) metric over all individual answer entities. Due to the incomplete nature of the large-scale knowledge bases, such an entity ranking setting is likely affected by unlabelled top-ranked positive examples, raising questions on whether the current evaluation protocol is sufficient to guarantee a fair comparison of KGC systems. To this end, this paper presents a systematic study on whether and how the label sparsity affects the current KGC evaluation with the popular micro metrics. Specifically, inspired by the TREC paradigm for large-scale information retrieval…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
