Assessing the Utility of Large Language Models for Phenotype-Driven Gene   Prioritization in Rare Genetic Disorder Diagnosis

Junyoung Kim (1); Jingye Yang (2; 4); Kai Wang (2; 3); Chunhua; Weng (1); Cong Liu (1) ((1) Department of Biomedical Informatics; Columbia; University; New York; NY; USA; (2) Raymond G. Perelman Center for Cellular; and Molecular Therapeutics; Children's Hospital of Philadelphia,; Philadelphia; USA; (3) Department of Pathology; Laboratory Medicine,; University of Pennsylvania; Philadelphia; USA; (4) Department of Mathematics,; University of Pennsylvania; Philadelphia; USA)

arXiv:2403.14801·q-bio.QM·April 4, 2024·1 cites

Assessing the Utility of Large Language Models for Phenotype-Driven Gene Prioritization in Rare Genetic Disorder Diagnosis

Junyoung Kim (1), Jingye Yang (2, 4), Kai Wang (2, 3), Chunhua, Weng (1), Cong Liu (1) ((1) Department of Biomedical Informatics, Columbia, University, New York, NY, USA, (2) Raymond G. Perelman Center for Cellular, and Molecular Therapeutics, Children's Hospital of Philadelphia

PDF

Open Access

TL;DR

This study evaluates the performance of large language models in phenotype-driven gene prioritization for rare genetic disorder diagnosis, revealing their current limitations and potential for integration into genomic analysis workflows.

Contribution

It provides a comprehensive assessment of multiple LLMs in gene prioritization tasks, highlighting their strengths, limitations, and the impact of prompt complexity and model size.

Findings

01

GPT-4 achieved 16.0% accuracy, still below traditional tools.

02

Prediction accuracy improves with larger models.

03

Complex prompts increase task completeness but may reduce structure compliance.

Abstract

Phenotype-driven gene prioritization is a critical process in the diagnosis of rare genetic disorders for identifying and ranking potential disease-causing genes based on observed physical traits or phenotypes. While traditional approaches rely on curated knowledge graphs with phenotype-gene relations, recent advancements in large language models have opened doors to the potential of AI predictions through extensive training on diverse corpora and complex models. This study conducted a comprehensive evaluation of five large language models, including two Generative Pre-trained Transformers series, and three Llama2 series, assessing their performance across three key metrics: task completeness, gene prediction accuracy, and adherence to required output structures. Various experiments explored combinations of models, prompts, input types, and task difficulty levels. Our findings reveal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Genomics and Rare Diseases · Topic Modeling

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Multi-Head Attention · Softmax · Dropout