# Integrating graph convolutional networks with large language models for structured biomedical material knowledge representation

**Authors:** Mufei Li, Yan Zhuang, Yao Hou, Ke Chen, Lin Han, Kefeng Wang, Xiangfeng Li, Xiangdong Zhu, Mingli Yang, Guangfu Yin, Jiangli Lin, Xingdong Zhang

PMC · DOI: 10.1093/rb/rbaf083 · Regenerative Biomaterials · 2025-08-09

## TL;DR

This paper introduces a new framework combining NLP and graph networks to extract structured data from biomedical materials texts, improving database building efficiency.

## Contribution

A novel hierarchical NLP framework integrating graph convolutional networks for biomedical materials knowledge extraction.

## Key findings

- The sentence-level model achieves 84.7% accuracy in entity and relation extraction.
- The GCN-based module achieves 84.0% accuracy in resolving cross-sentence co-references.
- The framework enables scalable and extensible biomedical materials knowledge graph construction.

## Abstract

Automated literature mining is key to building structured biomedical materials databases, yet current methods struggle with large publication volumes, complex entity relations and domain-specific terminology. We propose a hierarchical natural language processing (NLP) framework for extracting structured data from biomedical materials texts. Our pipeline uses named entity recognition (NER) to identify entities such as compositions, synthesis methods and properties. Sentence-level relation extraction captures direct associations (e.g. temperature, morphology), while a paragraph-level graph convolutional network (GCN) module resolves cross-sentence co-references. Rule-based templates enhance precision in specific cases. Extracted relations are integrated into a biomedical materials knowledge graph, enabling scalable and extensible data representation. Experiments show that the sentence-level model achieves 84.7% accuracy and the GCN-based module achieves 84.0%. This approach offers an efficient pipeline for structuring complex scientific texts, reducing manual effort and supporting large-scale knowledge extraction in biomedical materials and related domains.

## Full-text entities

- **Genes:** ECD (ecdysoneless cell cycle regulator) [NCBI Gene 11319] {aka GCR2, HSGT1, SGT1}
- **Diseases:** RE (MESH:D019973), toxicity (MESH:D064420)
- **Chemicals:** CD (MESH:D002104), CaO (MESH:C016538), EC (-), Ca(OH)2 (MESH:D002126)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12639542/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12639542/full.md

## References

55 references — full list in the complete paper: https://tomesphere.com/paper/PMC12639542/full.md

---
Source: https://tomesphere.com/paper/PMC12639542