KG20C & KG20C-QA: Scholarly Knowledge Graph Benchmarks for Link Prediction and Question Answering
Hung-Nghiep Tran, Atsuhiro Takasu

TL;DR
This paper introduces KG20C and KG20C-QA, high-quality scholarly knowledge graph datasets and benchmarks for link prediction and question answering, supporting research with detailed documentation and evaluation protocols.
Contribution
The paper provides the first peer-reviewed description of KG20C and its QA benchmark KG20C-QA, including construction details, QA templates, and baseline evaluations.
Findings
Benchmarking of knowledge graph embedding methods on KG20C-QA
Analysis of performance across different relation types
Provision of reproducible evaluation protocols
Abstract
In this paper, we present KG20C and KG20C-QA, two curated datasets for advancing question answering (QA) research on scholarly data. KG20C is a high-quality scholarly knowledge graph constructed from the Microsoft Academic Graph through targeted selection of venues, quality-based filtering, and schema definition. Although KG20C has been available online in non-peer-reviewed sources such as GitHub repository, this paper provides the first formal, peer-reviewed description of the dataset, including clear documentation of its construction and specifications. KG20C-QA is built upon KG20C to support QA tasks on scholarly data. We define a set of QA templates that convert graph triples into natural language question--answer pairs, producing a benchmark that can be used both with graph-based models such as knowledge graph embeddings and with text-based models such as large language models. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Topic Modeling · Expert finding and Q&A systems
