Turning Citation Networks Inside Out: Studying Science Using Content-Based Knowledge Graphs from LLM-Derived Taxonomies
Seorin Kim, Vincent Holst, Vincent Ginis

TL;DR
This paper introduces a content-based knowledge graph approach using LLM-derived taxonomies to map scientific fields, revealing methodological structures and temporal variations beyond traditional citation analysis.
Contribution
It presents a novel 'inside-out' method that reconstructs scientific field structures directly from text using interpretable knowledge components and domain-specific taxonomies.
Findings
Identified a stable methodological backbone in intergenerational wealth mobility studies.
Revealed temporal variations in component recombination.
Used betweenness-to-connectivity ratios to find structural bridges.
Abstract
Scientific fields are often mapped using citations and metadata, despite knowledge being transmitted primarily through content. We introduce an 'inside-out' approach that reconstructs field structure directly from text by representing each paper as a small set of interpretable knowledge components. Using a large language model to induce domain-specific taxonomies and label papers, each publication is encoded as a triplet of measure, data type, and research-question type. These triplets define a knowledge graph with edges weighted by shared papers. Applied to 617 studies on intergenerational wealth mobility, the graph reveals a stable methodological backbone centered on regression-based mobility measures, alongside substantial temporal variation in component recombination. We further utilize normalized betweenness-to-connectivity ratios to identify components and pairings that act as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · scientometrics and bibliometrics research · Research Data Management Practices
