PhaGO: Protein function annotation for bacteriophages by integrating the genomic context
Jiaojiao Guan, Yongxin Ji, Cheng Peng, Wei Zou, Xubo Tang, Jiayu, Shang, Yanni Sun

TL;DR
PhaGO is a novel tool that leverages genomic context and advanced embedding models to improve protein function annotation in bacteriophages, especially for diverged and uncommon proteins, aiding in understanding phage biology.
Contribution
This work introduces PhaGO, a new annotation method that integrates genomic modularity and Transformer-based embeddings to enhance phage protein function prediction.
Findings
Outperforms existing methods with 6.78% and 13.05% improvements in annotating diverged and uncommon proteins.
Can annotate proteins without homology search results, addressing a key challenge in phage genomics.
Successfully identified 688 potential holins, demonstrating practical utility.
Abstract
Bacteriophages are viruses that target bacteria, playing a crucial role in microbial ecology. Phage proteins are important in understanding phage biology, such as virus infection, replication, and evolution. Although a large number of new phages have been identified via metagenomic sequencing, many of them have limited protein function annotation. Accurate function annotation of phage proteins presents several challenges, including their inherent diversity and the scarcity of annotated ones. Existing tools have yet to fully leverage the unique properties of phages in annotating protein functions. In this work, we propose a new protein function annotation tool for phages by leveraging the modular genomic structure of phage genomes. By employing embeddings from the latest protein foundation models and Transformer to capture contextual information between proteins in phage genomes, PhaGO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Genomics and Phylogenetic Studies · Bioinformatics and Genomic Networks
MethodsLinear Layer · Layer Normalization · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Dense Connections
