Tracing the Flow of Knowledge From Science to Technology Using Deep Learning

Michael E. Rose; Mainak Ghosh; Sebastian Erhardt; Cheng Li; Erik Buunk; Dietmar Harhoff

arXiv:2512.24259·cs.CL·January 1, 2026

Tracing the Flow of Knowledge From Science to Technology Using Deep Learning

Michael E. Rose, Mainak Ghosh, Sebastian Erhardt, Cheng Li, Erik Buunk, Dietmar Harhoff

PDF

Open Access

TL;DR

This paper introduces Pat-SPECTER, a deep learning model that effectively measures semantic similarity between patents and scientific papers, aiding in understanding the knowledge transfer from science to technology.

Contribution

The paper presents a novel language similarity model fine-tuned on patents, outperforming existing models in predicting credible patent-paper citations.

Findings

01

Pat-SPECTER outperforms other models in citation prediction tasks.

02

US patents cite less similar papers, possibly due to legal disclosure requirements.

03

The model is publicly available for research and practical use.

Abstract

We develop a language similarity model suitable for working with patents and scientific publications at the same time. In a horse race-style evaluation, we subject eight language (similarity) models to predict credible Patent-Paper Citations. We find that our Pat-SPECTER model performs best, which is the SPECTER2 model fine-tuned on patents. In two real-world scenarios (separating patent-paper-pairs and predicting patent-paper-pairs) we demonstrate the capabilities of the Pat-SPECTER. We finally test the hypothesis that US patents cite papers that are semantically less similar than in other large jurisdictions, which we posit is because of the duty of candor. The model is open for the academic community and practitioners alike.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntellectual Property and Patents · Machine Learning in Materials Science · Language and cultural evolution