A Unified Framework of Medical Information Annotation and Extraction for Chinese Clinical Text
Enwei Zhu, Qilin Sheng, Huanwan Yang, Jinpeng Li

TL;DR
This paper introduces a comprehensive, unified framework for medical information extraction from Chinese clinical texts, including annotation, modeling, and evaluation, leveraging deep learning and pre-trained language models.
Contribution
It presents a unified annotation scheme, develops neural network models for multiple tasks, and releases a large annotated corpus and code for Chinese medical text processing.
Findings
Achieved high inter-annotator agreement scores.
Attained F1 scores of over 93% for entity recognition.
Demonstrated effective extraction of relations and attributes.
Abstract
Medical information extraction consists of a group of natural language processing (NLP) tasks, which collaboratively convert clinical text to pre-defined structured formats. Current state-of-the-art (SOTA) NLP models are highly integrated with deep learning techniques and thus require massive annotated linguistic data. This study presents an engineering framework of medical entity recognition, relation extraction and attribute extraction, which are unified in annotation, modeling and evaluation. Specifically, the annotation scheme is comprehensive, and compatible between tasks, especially for the medical relations. The resulted annotated corpus includes 1,200 full medical records (or 18,039 broken-down documents), and achieves inter-annotator agreements (IAAs) of 94.53%, 73.73% and 91.98% F 1 scores for the three tasks. Three task-specific neural network models are developed within a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques
