One in Eight OpenAlex Abstracts Has Integrity Issues

Seorin Kim; Vincent Holst; Vincent Ginis

arXiv:2605.20168·cs.DL·May 20, 2026

One in Eight OpenAlex Abstracts Has Integrity Issues

Seorin Kim, Vincent Holst, Vincent Ginis

PDF

TL;DR

This study evaluates the integrity of 10,000 OpenAlex abstracts, revealing that 12% contain issues like insufficient content or misplaced metadata, which impacts computational metascience research.

Contribution

It introduces a systematic assessment of abstract quality in bibliographic databases and identifies common failure modes affecting research reliability.

Findings

01

12% of abstracts have integrity issues

02

Insufficient content and misplaced metadata are most common

03

A community portal for annotation is proposed

Abstract

Scientific abstracts are increasingly used as primary data in computational metascience research, yet the quality of these abstracts in widely used bibliographic databases has not been systematically examined. We assess the integrity of 10,000 randomly sampled English-language journal abstracts from OpenAlex using a two-stage annotation protocol combining human expert review and large language model classification. We identify seven distinct failure modes and find that 12\% of abstracts have integrity issues, with insufficient content and misplaced metadata being the most prevalent. We discuss implications for downstream research and describe a forthcoming community portal to support collective annotation efforts.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.