Liver cancer knowledge graph construction based on dynamic entity replacement and masking strategies RoBERTa-wwm-large-BiLSTM-CRF model with clinical Chinese EMRs
Yichi Zhang, Xiaojun Hu, Hailing Wang, Ke Liu, Yongbin Gao, Xiaoyan Jiang, Yingfang Fan, Zhijun Fang

TL;DR
This paper introduces a new framework to build a liver cancer knowledge graph using real-world Chinese electronic medical records, improving entity recognition and integration of clinical data.
Contribution
The study proposes a novel DERM-based NER model and constructs the first Chinese liver cancer knowledge graph from real-world clinical data.
Findings
The proposed NER model achieved an F1 score of 93.96% on the RLC-EMRs dataset.
The liver cancer knowledge graph contains 46,364 entities and 296,655 semantic relationships.
A KG-based retrieval system was developed for querying clinical information like complications and medications.
Abstract
Liver cancer is a leading cause of cancer-related mortality worldwide, necessitating advanced tools for diagnosis and management. Knowledge graphs (KGs) are crucial for advancing smart healthcare, but existing liver cancer-specific KGs are mostly derived from literature or public databases, lacking integration with real-world clinical data [e.g., Electronic Medical Records (EMRs)], creating a critical gap. Furthermore, there is currently no publicly available KGs specifically for liver cancer, creating a significant gap in structured clinical knowledge resources. This study proposes a novel framework to construct the first Chinese liver cancer KG from Real-World Liver Cancer Electronic Medical Records (RLC-EMRs). A new named entity recognition (NER) model, DERM-RoBERTa-wwm-large-BiLSTM-CRF was developed that uses a Dynamic Entity Replacement and Masking (DERM) strategy to address data…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Machine Learning in Healthcare
