Self-Supervised Detection of Contextual Synonyms in a Multi-Class Setting: Phenotype Annotation Use Case
Jingqing Zhang, Luis Bolanos, Tong Li, Ashwani Tanwar, Guilherme, Freire, Xian Yang, Julia Ive, Vibhor Gupta, Yike Guo

TL;DR
This paper introduces a self-supervised approach for detecting contextual synonyms in large multi-class settings, significantly improving phenotype annotation in clinical texts and outperforming existing models with minimal labeled data.
Contribution
The paper presents a novel self-supervised pre-training method for detecting contextual synonyms in a large multi-class setting, enhancing phenotype annotation accuracy in electronic health records.
Findings
Achieved new SOTA for unsupervised phenotype annotation with up to 4.5 F1 points improvement.
Fine-tuning with 20% labeled data surpasses BioBERT and ClinicalBERT.
Using annotated phenotypes improves ICU benchmark performance.
Abstract
Contextualised word embeddings is a powerful tool to detect contextual synonyms. However, most of the current state-of-the-art (SOTA) deep learning concept extraction methods remain supervised and underexploit the potential of the context. In this paper, we propose a self-supervised pre-training approach which is able to detect contextual synonyms of concepts being training on the data created by shallow matching. We apply our methodology in the sparse multi-class setting (over 15,000 concepts) to extract phenotype information from electronic health records. We further investigate data augmentation techniques to address the problem of the class sparsity. Our approach achieves a new SOTA for the unsupervised phenotype concept annotation on clinical text on F1 and Recall outperforming the previous SOTA with a gain of up to 4.5 and 4.0 absolute points, respectively. After fine-tuning with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques
