Prot2Token: A Unified Framework for Protein Modeling via Next-Token Prediction
Mahdi Pourmirzaei, Farzaneh Esmaili, Salhuldin Alqarghuli, Mohammadreza Pourmirzaei, Ye Han, Kai Chen, Mohsen Rezaei, Duolin Wang, Dong Xu

TL;DR
Prot2Token is a versatile, unified framework that converts various protein prediction tasks into a next-token prediction format, enabling multi-task learning, faster predictions, and broad applicability across protein modeling tasks.
Contribution
It introduces Prot2Token, a novel unified autoregressive framework that standardizes diverse protein prediction tasks into a single generative model, improving efficiency and generalization.
Findings
Achieves up to 1000x speedup in 3D structure prediction compared to AlphaFold2.
Matches or surpasses specialized methods across multiple protein prediction tasks.
Demonstrates effective multi-task learning and improved performance with self-supervised pre-training.
Abstract
The diverse nature of protein prediction tasks has traditionally necessitated specialized models, hindering the development of broadly applicable and computationally efficient Protein Language Models (PLMs). In this work, we introduce Prot2Token, a unified framework that overcomes these challenges by converting a wide spectrum of protein-related predictions-from sequence-level properties and residue-specific attributes to complex inter-protein interactions-into a standardized next-token prediction format. At its core, Prot2Token employs an autoregressive decoder, conditioned on embeddings from pre-trained protein encoders and guided by learnable task tokens, to perform diverse predictions. This architecture uniquely facilitates multi-task learning, enabling general-purpose decoders to generalize across five distinct categories. We present extensive experimental validation across a…
Peer Reviews
Decision·Submitted to ICLR 2026
- Representing diverse downstream classification and regression tasks for protein sequences using a universal tokenization scheme is, in my opinion, interesting and novel. - Performance improvements over prior methods is significant in some benchmarks.
1. To me, the very shortcoming that the paper is trying to address is its main weakness. As the authors allude to, a universal model that can support any given downstream task is computationally prohibitive. Therefore, the method has been limited to a few downstream tasks only. 2. Following (1), it seems that different versions of Prot2Token were trained on different (combinations of) downstream tasks (Prot2Token-A/B/C/D). Is my understanding correct? If so, it defeats the purpose of having a un
- A good motivation: to build a unified model for every different protein tasks - Reasonable engineering effort to connect multiple existing components: for example, they unify different downstream tasks with one model. - Consistent writing: the writing is clear
- **Claim is big, but paper is unable to support the claim**: the biggest issue is that, this paper claims that they try to advance a huge step in the protein field, by unifying different tasks into one model, with some prompt tokens. However, the method part significantly lacks novelty. It just stated what they have used for model module building. Just stacking together without any deeper insights (either theoretical or empirical). Even worse, the performance tables show very limited baselines
1. this paper addresses a challenging issue in the downstream application of PLM: additional head across each tasks and architectures 2. this paper leverages a straightforward idea that turns everything into a next-token prediction framework training 3. the experiments spans multiple benchmarks including both cls/reg-level and sequence-level 4. paper writing is clear about its goals and experimental setup.
My concern mainly falls in the conceptual depth of "unified decoding" proposed in this paper, with several minor issues. 1. The "unified decoding" claimed in this paper mostly coms from designing a token vocabulary for task tokens. Functionally, this isn’t very different from instruction- or prefix-based tuning, it’s essentially another form of prompt learning in the view of learnable tokens. While the paper criticizes prior works for relying on prompt engineering, it ends up doing the same thin
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Machine Learning in Bioinformatics · Bioinformatics and Genomic Networks
