TL;DR
This paper introduces PAR, a multi-scale autoregressive framework for protein backbone generation that mimics sculpting, utilizing hierarchical modeling, transformer encoding, and flow-based decoding to produce high-quality structures.
Contribution
PAR is the first multi-scale autoregressive model for protein backbone generation, combining hierarchical structure representation, transformer encoding, and flow-based decoding with techniques to mitigate exposure bias.
Findings
PAR achieves high-quality protein backbone generation in unconditional benchmarks.
The model demonstrates strong zero-shot generalization and supports conditional generation and motif scaffolding.
PAR exhibits favorable scaling behavior and effectively learns protein distributions.
Abstract
We present protein autoregressive modeling (PAR), the first multi-scale autoregressive framework for protein backbone generation via coarse-to-fine next-scale prediction. Using the hierarchical nature of proteins, PAR generates structures that mimic sculpting a statue, forming a coarse topology and refining structural details over scales. To achieve this, PAR consists of three key components: (i) multi-scale downsampling operations that represent protein structures across multiple scales during training; (ii) an autoregressive transformer that encodes multi-scale information and produces conditional embeddings to guide structure generation; (iii) a flow-based backbone decoder that generates backbone atoms conditioned on these embeddings. Moreover, autoregressive models suffer from exposure bias, caused by the training and the generation procedure mismatch, and substantially degrades…
Peer Reviews
Decision·Submitted to ICLR 2026
- High technical novelty. The ability to define different granularities of the coarse structure to then refine is applicable to design tasks - Competitive performance. Demonstrates that the AR factorization of the problem works as desired without much loss of performance across any benchmarks. -Clear concise well written. - The fact that the AR process is done without explicit tokenization driven through the CA flow loss is elegant.
- No formal motif benchmark. Even if not SOTA it would be interesting to see. - No comparison of the inference speed
- The paper proposes the *first* multi-scale autoregressive model for protein backbone generation, integrating coarse-to-fine prediction within a single generative process. - The introduction of *noisy context learning* and *scheduled sampling* provides a principled way to mitigate exposure bias and mismatch between training and inference, a common challenge in autoregressive models. - PAR demonstrates strong *zero-shot generalization* and supports flexible conditional tasks such as motif scaffo
- Although the proposed multi-scale autoregressive formulation is conceptually novel, the paper does not clearly demonstrate *quantitative or qualitative advantages* over existing diffusion-based protein generative models in terms of either *generation quality* or *generation efficiency*. - In the *zero-shot generalization* experiments, the paper mainly focuses on demonstrating conditional controllability (e.g., motif scaffolding) but does not evaluate the *designability* or *physical plausibil
### Strengths: 1. The paper introduces a highly innovative approach by combining a **multi-scale autoregressive framework** with a **diffusion-based decoder** for protein backbone generation. This method allows for generating structures in a continuous coordinate space in a multi-scale manner. 2. The practical advantages of the multi-scale autoregressive model are compellingly demonstrated through the *"Backbone generation with human prompt"* and *"Zero-shot motif scaffolding"* applications.
1. **Lack of Analysis on Sampling Efficiency and Computational Cost:** A notable omission is the lack of analysis on sampling latency. While autoregressive (AR) models often present an advantage in generation speed over diffusion models in other domains, the PAR framework's architecture raises concerns. The model culminates in a diffusion decoder that appears computationally intensive (e.g., 1000 steps compared to Proteina's 400), and the preceding multi-scale AR stages introduce further comput
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · vaccines and immunoinformatics approaches · Bacteriophages and microbial interactions
