The gene function prediction challenge: large language models and   knowledge graphs to the rescue

Rohan Shawn Sunil; Shan Chun Lim; Manoj Itharajula; Marek Mutwil

arXiv:2408.07222·q-bio.MN·August 15, 2024

The gene function prediction challenge: large language models and knowledge graphs to the rescue

Rohan Shawn Sunil, Shan Chun Lim, Manoj Itharajula, Marek Mutwil

PDF

Open Access

TL;DR

This paper reviews how recent AI advances, including large language models and knowledge graphs, can significantly improve gene function prediction in plant science, addressing current limitations in experimental validation.

Contribution

It highlights the potential of integrating AI techniques like large language models and knowledge graphs to enhance gene function prediction methods.

Findings

01

AI methods can accelerate gene function annotation.

02

Knowledge graphs improve data integration for gene analysis.

03

Large language models assist in literature-based gene function inference.

Abstract

Elucidating gene function is one of the ultimate goals of plant science. Despite this, only ~15% of all genes in the model plant Arabidopsis thaliana have comprehensively experimentally verified functions. While bioinformatical gene function prediction approaches can guide biologists in their experimental efforts, neither the performance of the gene function prediction methods nor the number of experimental characterisation of genes has increased dramatically in recent years. In this review, we will discuss the status quo and the trajectory of gene function elucidation and outline the recent advances in gene function prediction approaches. We will then discuss how recent artificial intelligence advances in large language models and knowledge graphs can be leveraged to accelerate gene function predictions and keep us updated with scientific literature.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBioinformatics and Genomic Networks · Machine Learning in Bioinformatics