Ontology-Driven and Weakly Supervised Rare Disease Identification from Clinical Notes
Hang Dong, V\'ictor Su\'arez-Paniagua, Huayu Zhang, Minhong Wang,, Arlene Casey, Emma Davidson, Jiaoyan Chen, Beatrice Alex, William Whiteley,, Honghan Wu

TL;DR
This paper presents a novel ontology-driven, weakly supervised method leveraging pre-trained contextual models to identify rare diseases from clinical notes, significantly improving precision without extensive manual annotation.
Contribution
It introduces a weak supervision framework combined with ontologies and contextual embeddings to enhance rare disease identification from clinical text, reducing reliance on expert annotations.
Findings
Over 30-50% improvement in Text-to-UMLS linking precision.
Consistent results across multiple clinical datasets.
Effective extraction of rare disease cases beyond structured data.
Abstract
Computational text phenotyping is the practice of identifying patients with certain disorders and traits from clinical notes. Rare diseases are challenging to be identified due to few cases available for machine learning and the need for data annotation from domain experts. We propose a method using ontologies and weak supervision, with recent pre-trained contextual representations from Bi-directional Transformers (e.g. BERT). The ontology-based framework includes two steps: (i) Text-to-UMLS, extracting phenotypes by contextually linking mentions to concepts in Unified Medical Language System (UMLS), with a Named Entity Recognition and Linking (NER+L) tool, SemEHR, and weak supervision with customised rules and contextual mention representation; (ii) UMLS-to-ORDO, matching UMLS concepts to rare diseases in Orphanet Rare Disease Ontology (ORDO). The weakly supervised approach is proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Genomics and Rare Diseases
MethodsOntology
