CoDEx: A Comprehensive Knowledge Graph Completion Benchmark
Tara Safavi, Danai Koutra

TL;DR
CoDEx introduces a new, more challenging and diverse knowledge graph completion benchmark derived from Wikidata and Wikipedia, with extensive datasets, multilingual descriptions, and hard negatives, to advance research in link prediction.
Contribution
It provides a comprehensive, multi-faceted benchmark with detailed analyses, baseline results, and highlights its increased difficulty and diversity over existing datasets like FB15K-237.
Findings
CoDEx covers more diverse and interpretable content.
It is more difficult for current embedding models.
Baseline experiments show varying performance across datasets.
Abstract
We present CoDEx, a set of knowledge graph completion datasets extracted from Wikidata and Wikipedia that improve upon existing knowledge graph completion benchmarks in scope and level of difficulty. In terms of scope, CoDEx comprises three knowledge graphs varying in size and structure, multilingual descriptions of entities and relations, and tens of thousands of hard negative triples that are plausible but verified to be false. To characterize CoDEx, we contribute thorough empirical analyses and benchmarking experiments. First, we analyze each CoDEx dataset in terms of logical relation patterns. Next, we report baseline link prediction and triple classification results on CoDEx for five extensively tuned embedding models. Finally, we differentiate CoDEx from the popular FB15K-237 knowledge graph completion dataset by showing that CoDEx covers more diverse and interpretable content,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Data Quality and Management · Topic Modeling
