PhaVIP: Phage VIrion Protein classification based on chaos game representation and Vision Transformer
Jiayu Shang, Cheng Peng, Xubo Tang, Yanni Sun

TL;DR
PhaVIP introduces a novel computational approach that uses chaos game representation and Vision Transformer to accurately classify phage virion proteins, aiding microbiome research and phage taxonomy.
Contribution
This work adapts Vision Transformer with chaos gaming representation for the first time to classify phage virion proteins, achieving superior performance over existing methods.
Findings
PhaVIP outperforms alternative tools in protein classification accuracy.
Using PhaVIP improves phage taxonomy classification.
Classified proteins enhance phage host prediction results.
Abstract
Motivation: As viruses that mainly infect bacteria, phages are key players across a wide range of ecosystems. Analyzing phage proteins is indispensable for understanding phages' functions and roles in microbiomes. High-throughput sequencing enables us to obtain phages in different microbiomes with low cost. However, compared to the fast accumulation of newly identified phages, phage protein classification remains difficult. In particular, a fundamental need is to annotate virion proteins, the structural proteins such as major tail, baseplate etc. Although there are experimental methods for virion protein identification, they are too expensive or time-consuming, leaving a large number of proteins unclassified. Thus, there is a great demand to develop a computational method for fast and accurate phage virion protein classification. Results: In this work, we adapted the state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBacteriophages and microbial interactions · Machine Learning in Bioinformatics · Genomics and Phylogenetic Studies
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Adam · Layer Normalization · Label Smoothing · Residual Connection · Dense Connections · Position-Wise Feed-Forward Layer
