Wikipedia Citations: A comprehensive dataset of citations with identifiers extracted from English Wikipedia
Harshdeep Singh, Robert West, Giovanni Colavizza

TL;DR
This paper presents Wikipedia Citations, a large dataset of extracted citations from Wikipedia, including scholarly identifiers, enabling analysis of Wikipedia's source reliance and citation patterns.
Contribution
It introduces a comprehensive, publicly available dataset of Wikipedia citations with scholarly identifiers, facilitating research on source reliability and citation analysis.
Findings
6.7% of Wikipedia articles cite journal articles with DOIs
Wikipedia cites only 2% of Web of Science articles with DOIs
29.3 million citations extracted from 6.1 million articles
Abstract
Wikipedia's contents are based on reliable and published sources. To this date, relatively little is known about what sources Wikipedia relies on, in part because extracting citations and identifying cited sources is challenging. To close this gap, we release Wikipedia Citations, a comprehensive dataset of citations extracted from Wikipedia. A total of 29.3M citations were extracted from 6.1M English Wikipedia articles as of May 2020, and classified as being to books, journal articles or Web contents. We were thus able to extract 4.0M citations to scholarly publications with known identifiers -- including DOI, PMC, PMID, and ISBN -- and further equip an extra 261K citations with DOIs from Crossref. As a result, we find that 6.7% of Wikipedia articles cite at least one journal article with an associated DOI, and that Wikipedia cites just 2% of all articles with a DOI currently indexed in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWikis in Education and Collaboration · Topic Modeling · Biomedical Text Mining and Ontologies
