Building a comprehensive syntactic and semantic corpus of Chinese clinical texts
Bin He, Bin Dong, Yi Guan, Jinfeng Yang, Zhipeng Jiang, Qiubin Yu,, Jianyi Cheng, Chunyan Qu

TL;DR
This paper presents a comprehensive annotated corpus of Chinese clinical texts, including syntactic and semantic annotations, along with tools and guidelines, to advance NLP research in the clinical domain.
Contribution
The study introduces a new iterative annotation method, detailed guidelines, and a high-quality corpus with associated NLP tools for Chinese clinical texts.
Findings
The corpus contains 138 documents with 47,424 tokens and 2553 full parsing trees.
Semantic annotations include 39,511 entities and 7695 relations.
The developed NLP system modules demonstrated effectiveness and high annotation quality.
Abstract
Objective: To build a comprehensive corpus covering syntactic and semantic annotations of Chinese clinical texts with corresponding annotation guidelines and methods as well as to develop tools trained on the annotated corpus, which supplies baselines for research on Chinese texts in the clinical domain. Materials and methods: An iterative annotation method was proposed to train annotators and to develop annotation guidelines. Then, by using annotation quality assurance measures, a comprehensive corpus was built, containing annotations of part-of-speech (POS) tags, syntactic tags, entities, assertions, and relations. Inter-annotator agreement (IAA) was calculated to evaluate the annotation quality and a Chinese clinical text processing and information extraction system (CCTPIES) was developed based on our annotated corpus. Results: The syntactic corpus consists of 138 Chinese…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques
