DiaKG: an Annotated Diabetes Dataset for Medical Knowledge Graph Construction
Dejie Chang, Mosha Chen, Chaozhen Liu, Liping Liu, Dongdong Li, Wei, Li, Fei Kong, Bangchang Liu, Xiaobin Luo, Ji Qi, Qiao Jin, Bin Xu

TL;DR
This paper introduces DiaKG, a high-quality Chinese dataset with over 22,000 entities and nearly 7,000 relations, designed to advance the construction of diabetes knowledge graphs and support AI applications in the medical domain.
Contribution
The paper provides a new annotated dataset for diabetes knowledge graph construction and benchmarks existing NER and relation extraction methods on it.
Findings
Existing methods find DiaKG challenging, indicating room for improvement.
Benchmark results highlight the dataset's complexity and utility for future research.
Analysis suggests directions for enhancing medical knowledge extraction techniques.
Abstract
Knowledge Graph has been proven effective in modeling structured information and conceptual knowledge, especially in the medical domain. However, the lack of high-quality annotated corpora remains a crucial problem for advancing the research and applications on this task. In order to accelerate the research for domain-specific knowledge graphs in the medical domain, we introduce DiaKG, a high-quality Chinese dataset for Diabetes knowledge graph, which contains 22,050 entities and 6,890 relations in total. We implement recent typical methods for Named Entity Recognition and Relation Extraction as a benchmark to evaluate the proposed dataset thoroughly. Empirical results show that the DiaKG is challenging for most existing methods and further analysis is conducted to discuss future research direction for improvements. We hope the release of this dataset can assist the construction of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques
