# GO2Sum: generating human-readable functional summary of proteins from GO terms

**Authors:** Swagarika Jaharlal Giri, Nabil Ibtehaz, Daisuke Kihara

PMC · DOI: 10.1038/s41540-024-00358-0 · 2024-03-15

## TL;DR

GO2Sum is a tool that converts complex protein function data into easy-to-read summaries for biologists.

## Contribution

GO2Sum introduces a novel method to generate human-readable summaries of protein functions from Gene Ontology terms.

## Key findings

- GO2Sum outperforms the original T5 model in generating accurate function descriptions for UniProt entries.
- The model was fine-tuned using GO term assignments and free-text descriptions from UniProt data.
- GO2Sum effectively summarizes Function, Subunit Structure, and Pathway information.

## Abstract

Understanding the biological functions of proteins is of fundamental importance in modern biology. To represent a function of proteins, Gene Ontology (GO), a controlled vocabulary, is frequently used, because it is easy to handle by computer programs avoiding open-ended text interpretation. Particularly, the majority of current protein function prediction methods rely on GO terms. However, the extensive list of GO terms that describe a protein function can pose challenges for biologists when it comes to interpretation. In response to this issue, we developed GO2Sum (Gene Ontology terms Summarizer), a model that takes a set of GO terms as input and generates a human-readable summary using the T5 large language model. GO2Sum was developed by fine-tuning T5 on GO term assignments and free-text function descriptions for UniProt entries, enabling it to recreate function descriptions by concatenating GO term descriptions. Our results demonstrated that GO2Sum significantly outperforms the original T5 model that was trained on the entire web corpus in generating Function, Subunit Structure, and Pathway paragraphs for UniProt entries.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC10943200/full.md

---
Source: https://tomesphere.com/paper/PMC10943200