Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets
Erica Cai, Sean McQuade, Kevin Young, Brendan O'Connor

TL;DR
This paper introduces AffilKG, a novel collection of datasets pairing complete book scans with large, labeled knowledge graphs to evaluate and improve the accuracy of automated knowledge graph extraction for social science applications.
Contribution
The paper presents the first datasets linking entire books with comprehensive knowledge graphs, enabling evaluation of extraction methods in real-world social science contexts.
Findings
Model performance varies significantly across datasets
AffilKG enables benchmarking of extraction error impacts
Validates KG extraction methods for social science research
Abstract
When knowledge graphs (KGs) are automatically extracted from text, are they accurate enough for downstream analysis? Unfortunately, current annotated datasets can not be used to evaluate this question, since their KGs are highly disconnected, too small, or overly complex. To address this gap, we introduce AffilKG (https://doi.org/10.5281/zenodo.15427977), which is a collection of six datasets that are the first to pair complete book scans with large, labeled knowledge graphs. Each dataset features affiliation graphs, which are simple KGs that capture Member relationships between Person and Organization entities -- useful in studies of migration, community interactions, and other social phenomena. In addition, three datasets include expanded KGs with a wider variety of relation types. Our preliminary experiments demonstrate significant variability in model performance across datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Complex Network Analysis Techniques · Data Quality and Management
