Quotegraph: A Social Network Extracted from Millions of News Quotations
Marko \v{C}uljak, Robert West, Andreas Spitz, Akhil Arora

TL;DR
Quotegraph is a large-scale, biographically enriched social network derived from news quotations, enabling analysis of public figures' behavior and relationships over time.
Contribution
It introduces Quotegraph, a novel, extensive social network from news quotations with biographic and contextual data, adaptable to multiple languages.
Findings
Contains 528,000 nodes and 8.63 million edges.
Links nodes to Wikidata for biographic info.
Enables analysis of public figures in news context.
Abstract
We introduce Quotegraph, a novel large-scale social network derived from speaker-attributed quotations in English news articles published between 2008 and 2020. Quotegraph consists of 528 thousand unique nodes and 8.63 million directed edges, pointing from speakers to persons they mention. The nodes are linked to their corresponding items in Wikidata, thereby endowing the dataset with detailed biographic entity information, including nationality, gender, and political affiliation. Being derived from Quotebank, a massive corpus of quotations, relations in Quotegraph are additionally enriched with the information about the context in which they are featured. Each part of the network construction pipeline is language agnostic, enabling the construction of similar datasets based on non-English news corpora. We believe Quotegraph is a compelling resource for computational social scientists,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Advanced Text Analysis Techniques · Topic Modeling
