PTransIPs: Identification of phosphorylation sites enhanced by protein PLM embeddings
Ziyang Xu, Haitian Zhong, Bingrui He, Xueying Wang, Tianchi Lu

TL;DR
PTransIPs is a novel deep learning framework that leverages protein PLM embeddings and Transformer architecture to accurately identify phosphorylation sites, outperforming existing methods and serving as a universal tool for peptide bioactivity prediction.
Contribution
It is the first to apply protein pre-trained language model embeddings to phosphorylation site prediction, enhancing performance and addressing dataset limitations.
Findings
Achieved AUCs of 0.9232 and 0.9660 for S/T and Y phosphorylation sites.
Outperforms existing state-of-the-art methods.
Serves as a universal framework for peptide bioactivity tasks.
Abstract
Phosphorylation is pivotal in numerous fundamental cellular processes and plays a significant role in the onset and progression of various diseases. The accurate identification of these phosphorylation sites is crucial for unraveling the molecular mechanisms within cells and during viral infections, potentially leading to the discovery of novel therapeutic targets. In this study, we develop PTransIPs, a new deep learning framework for the identification of phosphorylation sites. Independent testing results demonstrate that PTransIPs outperforms existing state-of-the-art (SOTA) methods, achieving AUCs of 0.9232 and 0.9660 for the identification of phosphorylated S/T and Y sites, respectively. PTransIPs contributes from three aspects. 1) PTransIPs is the first to apply protein pre-trained language model (PLM) embeddings to this task. It utilizes ProtTrans and EMBER2 to extract sequence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · RNA and protein synthesis mechanisms · Genomics and Phylogenetic Studies
MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Label Smoothing · Linear Layer · Adam · Residual Connection · Dense Connections · Dropout · Absolute Position Encodings
