PhenoTagger: A Hybrid Method for Phenotype Concept Recognition using Human Phenotype Ontology
Ling Luo, Shankai Yan, Po-Ting Lai, Daniel Veltri, Andrew Oler,, Sandhya Xirasagar, Rajarshi Ghosh, Morgan Similuk, Peter N. Robinson, Zhiyong, Lu

TL;DR
PhenoTagger is a hybrid approach combining dictionary matching and deep learning to improve phenotype concept recognition in biomedical texts, reducing the need for manual annotation.
Contribution
It introduces a novel hybrid method that leverages distant supervision and deep learning for phenotype recognition, enhancing performance without extensive manual annotation.
Findings
Outperforms previous methods on HPO corpora
Achieves competitive results without manual training data
Demonstrates generalizability to disease ontology MEDIC
Abstract
Automatic phenotype concept recognition from unstructured text remains a challenging task in biomedical text mining research. Previous works that address the task typically use dictionary-based matching methods, which can achieve high precision but suffer from lower recall. Recently, machine learning-based methods have been proposed to identify biomedical concepts, which can recognize more unseen concept synonyms by automatic feature learning. However, most methods require large corpora of manually annotated data for model training, which is difficult to obtain due to the high cost of human annotation. In this paper, we propose PhenoTagger, a hybrid method that combines both dictionary and machine learning-based methods to recognize Human Phenotype Ontology (HPO) concepts in unstructured biomedical text. We first use all concepts and synonyms in HPO to construct a dictionary, which is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsHyper-parameter optimization
