# ERNIE-UIE: Advancing information extraction in Chinese medical knowledge graph

**Authors:** Bei Li, Changbiao Li, Jianwei Sun, Xu Zeng, Xiaofan Chen, Jing Zheng, Emiliano Damian Alvarez Leites, Emiliano Damian Alvarez Leites, Emiliano Damian Alvarez Leites, Emiliano Damian Alvarez Leites

PMC · DOI: 10.1371/journal.pone.0325082 · PLOS One · 2025-05-29

## TL;DR

This paper introduces ERNIE-UIE, a model that improves Chinese medical knowledge graph construction with minimal annotated data.

## Contribution

ERNIE-UIE optimizes knowledge graph construction using a generative extraction paradigm with limited annotations.

## Key findings

- The model extracted 8,525 entities and 9,522 triples for the medical knowledge graph.
- Graph algorithms verified the accuracy and efficacy of the constructed knowledge graph.
- The approach reduces reliance on large annotated datasets for knowledge graph development.

## Abstract

The field of information extraction (IE) is currently exploring more versatile and efficient methods for minimization of reliance on extensive annotated datasets and integration of knowledge across tasks and domains.

We aim to evaluate and refine the application of the universal IE (UIE) technology in the field of Chinese medical expertise in terms of processing accuracy and efficiency.

Our model integrates ontology modeling, web scraping, UIE, fine-tuning strategies, and graph databases, thereby covering knowledge modeling, extraction, and storage techniques. The Enhanced Representation through Knowledge Integration-UIE (ERNIE-UIE) model is fine-tuned and optimized using a small amount of annotated data. A medical knowledge graph is then constructed, followed by validating the graph and conducting knowledge mining on the data stored within it.

Incorporating the characteristics of whole-course management, we implemented a comprehensive medical knowledge graph–construction model and methodology. Entities and relationships were jointly extracted using the pretrained language model, resulting in 8,525 entity data points and 9,522 triple data points. The accuracy of the knowledge graph was verified using graph algorithms.

We optimized the construction process of a Chinese medical knowledge graph with minimal annotated data by utilizing a generative extraction paradigm, validating the graph’s efficacy and achieving commendable results. This approach addresses the challenge of insufficient annotated training corpora in low-resource knowledge graph construction, thereby contributing to cost savings in the development of knowledge graphs.

## Full-text entities

- **Diseases:** -UIE (MESH:C563594), intracerebral hemorrhage (MESH:D002543), aspiration (MESH:D011015), Symptom (MESH:D012816), hallucination (MESH:D006212), Head CT (MESH:D006258), intracranial hypertension (MESH:D019586), swelling (MESH:D004487), Hypertension (MESH:D006973), hypertensive cerebral hemorrhage (MESH:D020299), Headache (MESH:D006261), pneumonia (MESH:D011014), Hematoma (MESH:D006406), respiratory failure or distress (MESH:D012131)
- **Chemicals:** PONE-D-24-28816R2 (-), -D- (MESH:D003903)
- **Species:** Homo sapiens (human, species) [taxon 9606], Idiomarina sp. E (species) [taxon 461371]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12121792/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12121792/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC12121792/full.md

---
Source: https://tomesphere.com/paper/PMC12121792