ProteoKnight: Convolution-based phage virion protein classification and uncertainty analysis
Samiha Afaf Neha, Abir Ahammed Bhuiyan, Md. Ishrak Khan

TL;DR
ProteoKnight introduces a novel image-based encoding for phage virion protein classification using CNNs, achieving high accuracy and enabling uncertainty estimation through Monte Carlo Dropout, thus improving annotation reliability.
Contribution
The paper presents a new encoding method for protein sequences that enhances spatial information retention and applies pre-trained CNNs for accurate PVP classification with uncertainty analysis.
Findings
Achieved 90.8% accuracy in binary PVP classification.
Encoding method outperforms FCGR by reducing spatial information loss.
Uncertainty analysis reveals confidence variability based on protein class and length.
Abstract
\textbf{Introduction:} Accurate prediction of Phage Virion Proteins (PVP) is essential for genomic studies due to their crucial role as structural elements in bacteriophages. Computational tools, particularly machine learning, have emerged for annotating phage protein sequences from high-throughput sequencing. However, effective annotation requires specialized sequence encodings. Our paper introduces ProteoKnight, a new image-based encoding method that addresses spatial constraints in existing techniques, yielding competitive performance in PVP classification using pre-trained convolutional neural networks. Additionally, our study evaluates prediction uncertainty in binary PVP classification through Monte Carlo Dropout (MCD). \textbf{Methods:} ProteoKnight adapts the classical DNA-Walk algorithm for protein sequences, incorporating pixel colors and adjusting walk distances to capture…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Fractal and DNA sequence analysis · Bacteriophages and microbial interactions
