# Quantifying Innovation in Stroke: Large Language Model Bibliometric Analysis

**Authors:** Adam Marcus, Georgina Lockwood-Taylor, Daniel Rueckert, Paul Bentley

PMC · DOI: 10.2196/70754 · Journal of Medical Internet Research · 2026-01-20

## TL;DR

This study uses a large language model to analyze stroke-related patents and publications, identifying which areas of stroke innovation are growing fastest.

## Contribution

The novel use of a large language model to filter and analyze stroke-related patents and publications, mapping them to an innovation life cycle.

## Key findings

- A large language model achieved 99.2% accuracy in identifying stroke-related patents.
- Pharmacological treatment patents have plateaued, while AI methods and rehabilitation devices show exponential growth.
- LLMs offer a scalable method to quantify innovation in healthcare.

## Abstract

Thrombolysis and mechanical thrombectomy represent the most successful stroke innovations over the last 30 years. Quantifying innovation in stroke is essential for identifying productive research lines and prioritizing funding, but health care lacks validated methods for measuring innovation.

This study aimed to systematically evaluate the relationship between stroke-related patents and publications, demonstrate the feasibility of using large language models (LLMs) in this process, and identify the most rapidly advancing innovations in stroke care by mapping them to a theoretical innovation life cycle.

The Open Patent Services (European Patent Office) and PubMed databases were searched between 1993 and 2023 for “stroke OR cerebrovascular.” In this bibliometric patent-publication analysis, a 13 billion–parameter Llama LLM was trained to identify patents related to stroke disease, as opposed to other references to the word “stroke,” on a manually labeled subset of 5000 patents and assessed using 5-fold cross-validation. The LLM filtered irrelevant results, and the resulting patent codes were grouped into innovation clusters. For each cluster, annual patent and publication counts were normalized to adjust for global trends. Cluster-specific growth curves were plotted to analyze the rates and characteristics of growth. The innovation life cycle stage for each innovation cluster was estimated by fitting a sigmoid curve to the patent and publication data consistent with the diffusion of innovations theory by Rogers.

The cross-validated accuracy of the LLM was 99.2%, with a sensitivity of 96.5% and a specificity of 99.6%. An initial bibliometric search retrieved 237,035 patents and 486,664 research publications. A manual review of a random sample of patents before filtering revealed that only 11.2% (56/500) were relevant to stroke. After LLM filtering, of the 237,035 patents, 28,225 (11.9%) stroke-related patents remained. These were grouped into 7 innovation clusters: pharmacological treatment, alternative medicine, rehabilitation devices, medical imaging, diagnostic testing, surgical devices, and artificial intelligence (AI) methods. Patent and publication counts were strongly correlated across clusters (Spearman rs=0.65-0.92; P<.006) except for pharmacological treatment (rs=0.09) and alternative medicine (rs=0.55). Pharmacological treatments were the top-performing cluster over the last 30 years, accounting for 49.3% (36,005/73,094) of all patents, but patent activity in this area has plateaued since the late 2000s. AI methods, rehabilitation devices, and medical imaging exhibited exponential rates of patent growth, with annual normalized increases of 39.2%, 15.9%, and 5.8% compared to 16.9%, 5.3%, and 2.2% for publications, respectively.

Applying an LLM to publicly available patent and publication data provides a scalable way to quantify innovation in stroke. Pharmacological treatment appears to have entered a saturation phase, whereas AI methods, rehabilitation devices, and medical imaging remain in rapid growth, highlighting areas of greatest traction for future research and investment.

## Linked entities

- **Diseases:** stroke (MONDO:0005098)

## Full-text entities

- **Diseases:** cerebrovascular (MESH:D002561), depression (MESH:D003866), IPC (MESH:D008310), AI (MESH:C538142), Stroke (MESH:D020521), Neurological Disorders (MESH:D009461), patent (MESH:D004374)
- **Species:** Homo sapiens (human, species) [taxon 9606], Lama glama (llama, species) [taxon 9844]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12869152/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12869152/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/PMC12869152/full.md

---
Source: https://tomesphere.com/paper/PMC12869152