The OpenCitations Index
Ivan Heibi, Arianna Moretti, Silvio Peroni, Marta Soricetti

TL;DR
The OpenCitations Index is a comprehensive, open, and deduplicated citation database that integrates data from multiple sources, enabling open access and systematic querying of over 2 billion citation links using Semantic Web technologies.
Contribution
This paper introduces the OpenCitations Index, a large-scale open citation data collection with a novel deduplication mechanism and use of persistent identifiers, enhancing data accuracy and accessibility.
Findings
Stores over 2 billion unique citation links
Integrates data from multiple major sources
Provides open access via various query services
Abstract
This article presents the OpenCitations Index, a collection of open citation data maintained by OpenCitations, an independent, not-for-profit infrastructure organisation for open scholarship dedicated to publishing open bibliographic and citation data using Semantic Web and Linked Open Data technologies. The collection involves citation data harvested from multiple sources. To address the possibility of different sources providing citation data for bibliographic entities represented with different identifiers, therefore potentially representing same citation, a deduplication mechanism has been implemented. This ensures that citations integrated into OpenCitations Index are accurately identified uniquely, even when different identifiers are used. This mechanism follows a specific workflow, which encompasses a preprocessing of the original source data, a management of the provided…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
