A Simplified Retriever to Improve Accuracy of Phenotype Normalizations by Large Language Models
Daniel B. Hier, Thanh Son Do, Tayo Obafemi-Ajayi

TL;DR
This paper presents a simplified BioBERT-based retriever that significantly improves phenotype normalization accuracy of large language models by efficiently searching the Human Phenotype Ontology without explicit term definitions.
Contribution
The work introduces a streamlined retrieval method using BioBERT embeddings that enhances LLM performance in phenotype normalization without complex retrieval systems.
Findings
Normalization accuracy increased from 62.3% to 90.3%.
Method is effective on OMIM clinical synopses.
Approach is adaptable to other biomedical normalization tasks.
Abstract
Large language models (LLMs) have shown improved accuracy in phenotype term normalization tasks when augmented with retrievers that suggest candidate normalizations based on term definitions. In this work, we introduce a simplified retriever that enhances LLM accuracy by searching the Human Phenotype Ontology (HPO) for candidate matches using contextual word embeddings from BioBERT without the need for explicit term definitions. Testing this method on terms derived from the clinical synopses of Online Mendelian Inheritance in Man (OMIM), we demonstrate that the normalization accuracy of a state-of-the-art LLM increases from a baseline of 62.3% without augmentation to 90.3% with retriever augmentation. This approach is potentially generalizable to other biomedical term normalization tasks and offers an efficient alternative to more complex retrieval methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Gene expression and cancer classification · Bioinformatics and Genomic Networks
MethodsOntology
