# A dataset of scientific citations in U.S. patent Office Actions

**Authors:** Kyle Higham, Hannah Kotula, Emma Scharfmann, Steve Gong, Gaétan de Rassenfosse

PMC · DOI: 10.1038/s41597-026-06720-7 · Scientific Data · 2026-01-31

## TL;DR

This paper introduces a large dataset of scientific citations from U.S. patent Office Actions, offering insights into patent examination and science-technology connections.

## Contribution

The novel contribution is a curated and disambiguated dataset of scientific citations from patent Office Actions, linked to OpenAlex for broader research use.

## Key findings

- The dataset includes 850,000 citations from U.S. patent Office Actions, with 265,000 linked to scientific literature.
- Citations are classified into 14 categories and disambiguated using machine learning and external bibliographic services.
- The dataset is openly available to support research on examiner behavior and science-technology linkages.

## Abstract

We present a curated dataset of about 850,000 citations extracted from Office Actions issued by examiners at the United States Patent and Trademark Office. These references, historically underused due to accessibility challenges, provide a granular view into the patent examination process and complement traditional front-page citation data. We classify each citation into one of 14 categories and focus on the 265,000 references to scientific literature, which we parse, clean, and disambiguate using machine learning and external bibliographic services. To enhance reusability, disambiguated records are linked to OpenAlex, a comprehensive research metadata platform. The dataset enables new research on examiner behavior, science-technology linkages, and the construction of citation-based metrics. All data and code are openly available to facilitate reuse across disciplines.

## Full-text entities

- **Genes:** NPL (N-acetylneuraminate pyruvate lyase) [NCBI Gene 80896] {aka C112, C1orf13, NAL, NPL1}
- **Diseases:** OA (MESH:D010003)
- **Chemicals:** GPT-4 (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12963361/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12963361/full.md

## References

9 references — full list in the complete paper: https://tomesphere.com/paper/PMC12963361/full.md

---
Source: https://tomesphere.com/paper/PMC12963361