# ProtFun: a protein function prediction model using graph attention networks with a protein large language model

**Authors:** Muhammed Talo, Serdar Bozdag

PMC · DOI: 10.1093/bioadv/vbaf245 · Bioinformatics Advances · 2025-10-11

## TL;DR

ProtFun is a new model that predicts protein functions using a combination of graph attention networks and a protein language model, improving accuracy over existing methods.

## Contribution

The novel contribution is a multimodal deep learning architecture that integrates graph attention networks with protein language model embeddings for function prediction.

## Key findings

- ProtFun outperformed state-of-the-art methods on three benchmark datasets.
- An ablation study confirmed the importance of integrating different components of the model.

## Abstract

Understanding protein functions facilitates the identification of the underlying causes of many diseases and guides the research for discovering new therapeutic targets and medications. With the advancement of high throughput technologies, obtaining novel protein sequences has been a routine process. However, determining protein functions experimentally is cost- and labor-prohibitive. Therefore, it is crucial to develop computational methods for automatic protein function prediction.

In this study, we propose a multimodal deep learning architecture called ProtFun to predict protein functions. ProtFun integrates protein large language model embeddings as node features in a protein family network. Employing graph attention networks on this protein family network, ProtFun learns protein embeddings, which are integrated with protein signature representations from InterPro to train a protein function prediction model. We evaluated our architecture using three benchmark datasets. Our results showed that our proposed approach outperformed current state-of-the-art methods for most cases. An ablation study also highlighted the importance of different components of ProtFun.

The data and source code of ProtFun is available at https://github.com/bozdaglab/ProtFun under Creative Commons Attribution Non Commercial 4.0 International Public License.

## Full-text entities

- **Genes:** TTC41P (tetratricopeptide repeat domain 41, pseudogene) [NCBI Gene 253724] {aka GNN, GNNP}
- **Diseases:** AUPRC (MESH:D011855), PFN (MESH:D011488)
- **Chemicals:** CCO (-)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Arabidopsis thaliana (mouse-ear cress, species) [taxon 3702], Homo sapiens (human, species) [taxon 9606], Drosophila melanogaster (fruit fly, species) [taxon 7227]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12571506/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12571506/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/PMC12571506/full.md

---
Source: https://tomesphere.com/paper/PMC12571506