(Non-)retracted academic papers in OpenAlex
Christian Hauschke, Serhii Nazarovets

TL;DR
This paper identifies a flaw in OpenAlex's handling of retracted papers, revealing that its boolean classification often mislabels publication status, which can mislead users relying on its data.
Contribution
The study uncovers a specific issue in OpenAlex's data integration process that causes misclassification of retracted papers, highlighting a critical flaw in scholarly data management.
Findings
OpenAlex's 'is_retracted' field often misclassifies papers.
The issue affects data from Dec 22, 2023 to Mar 19, 2024.
Users should verify and correct data during this period.
Abstract
The proliferation of scholarly publications underscores the necessity for reliable tools to navigate scientific literature. OpenAlex, an emerging platform amalgamating data from diverse academic sources, holds promise in meeting these evolving demands. Nonetheless, our investigation uncovered a flaw in OpenAlex's portrayal of publication status, particularly concerning retractions. Despite accurate metadata sourced from Crossref database, OpenAlex consolidated this information into a single boolean field, "is_retracted," leading to misclassifications of papers. This challenge not only impacts OpenAlex users but also extends to users of other academic resources integrating the OpenAlex API. The issue affects data provided by OpenAlex in the period between 22 Dec 2023 and 19 Mar 2024. Anyone using data from this period should urgently check it and replace it if necessary.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics · Academic integrity and plagiarism
