# Network Analysis of the Organic Chemistry in Patents, Literature, and Pharmaceutical Industry

**Authors:** Emma Svensson, Emma Granqvist, Tomas Bastys, Christos Kannas, Mikhail Kabeshov, Samuel Genheden, Ola Engkvist, Thierry Kogej

PMC · DOI: 10.1002/minf.202500011 · Molecular Informatics · 2025-07-18

## TL;DR

This paper compares chemical reaction networks from patents, literature, and a pharmaceutical company's lab data to understand differences in their structures and implications for drug discovery.

## Contribution

The study introduces in-house electronic lab notebook data into chemical reaction network analysis, revealing novel structural differences compared to public sources.

## Key findings

- Reaxys has the most interconnected network with a large core of nodes.
- USPTO has low connectivity and a small core, while ELN has intermediate connectivity with no core.
- Hub molecules in ELN and USPTO are similar, dominated by small organic building blocks.

## Abstract

Chemical reactions can be connected in large networks such as knowledge graphs. In this way, prior work has been able to draw meaningful conclusions about the properties and structures involved in organic chemistry reactions. However, the research has focused on public sources of organic synthesis that might lack the intricate details of the synthetic routes used in in‐house drug discovery. In this work, previous analyses are expanded to also include an in‐house electronic lab notebook (ELN) source, such that we can compare it to knowledge graphs that were constructed from US Patent and Trademark Office (USPTO) and Reaxys. We found that the Reaxys knowledge graph is the most interconnected and has the largest proportion of nodes belonging to the core, whereas the USPTO is much less connected and only has a small core. The ELN knowledge graph falls between these extremes in connectivity and it does not have any core. The hub molecules of ELN and USPTO are most similar, primarily represented by small, organic building blocks. We hypothesize that these differences can be attributed to the different origins of the data in the three sources. We discuss what impact this might have on synthesis prediction modelling.

Knowledge graphs built on AstraZeneca electronic lab notebook, US Patent and Trademark Office (USPTO) and Reaxys chemical reaction data are compared using various knowledge graph metrics.© 2025 WILEY‐VCH GmbH

## Full-text entities

- **Diseases:** ELN (MESH:D028361)
- **Chemicals:** acetic anhydride (MESH:C031800), Pd (MESH:D010165), methanol (MESH:D000432), amide (MESH:D000577), carbon dioxide (MESH:D002245), BOC (-), C (MESH:D002244), ammonia (MESH:D000641), carbon monoxide (MESH:D002248), methyl iodide (MESH:C014055), acetone (MESH:D000096), BOC anhydride (MESH:C027600)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12273192/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12273192/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/PMC12273192/full.md

---
Source: https://tomesphere.com/paper/PMC12273192