MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction
Cheng Tan, Zhenxiao Cao, Zhangyang Gao, Lirong Wu, Siyuan Li, Yufei, Huang, Jun Xia, Bozhen Hu, Stan Z. Li

TL;DR
MeToken introduces a novel approach that combines sequence and structural information through micro-environment tokens to improve the prediction of post-translational modification sites and types in proteins.
Contribution
This work presents the MeToken model, which uniquely integrates protein structural context with sequence data using uniform sub-codebooks for enhanced PTM prediction.
Findings
MeToken outperforms existing methods in PTM site prediction accuracy.
Incorporating structural data significantly improves PTM type classification.
The model effectively handles rare PTM types through uniform sub-codebooks.
Abstract
Post-translational modifications (PTMs) profoundly expand the complexity and functionality of the proteome, regulating protein attributes and interactions that are crucial for biological processes. Accurately predicting PTM sites and their specific types is therefore essential for elucidating protein function and understanding disease mechanisms. Existing computational approaches predominantly focus on protein sequences to predict PTM sites, driven by the recognition of sequence-dependent motifs. However, these approaches often overlook protein structural contexts. In this work, we first compile a large-scale sequence-structure PTM dataset, which serves as the foundation for fair comparison. We introduce the MeToken model, which tokenizes the micro-environment of each amino acid, integrating both sequence and structural information into unified discrete tokens. This model not only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Protein Degradation and Inhibitors
MethodsFocus
