Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles
Zhanghan Ni, Yanjing Li, Zeju Qiu, Bernhard Sch\"olkopf, Hongyu Guo, Weiyang Liu, Shengchao Liu

TL;DR
This paper introduces RigidSSL, a geometric pretraining framework for protein design that learns global structural priors and conformational dynamics, significantly improving generative modeling and downstream tasks.
Contribution
The paper presents RigidSSL, a novel two-phase geometric pretraining method that jointly models protein structure and dynamics, addressing limitations of local representations and static modeling.
Findings
Improves designability by up to 43% in protein generation
Enhances success rate by 5.8% in zero-shot motif scaffolding
Captures realistic conformational ensembles in GPCR modeling
Abstract
Generative models have recently advanced protein design by learning the statistical regularities of natural structures. However, current approaches face three key limitations: (1) Existing methods cannot jointly learn protein geometry and design tasks, where pretraining can be a solution; (2) Current pretraining methods mostly rely on local, non-rigid atomic representations for property prediction downstream tasks, limiting global geometric understanding for protein generation tasks; and (3) Existing approaches have yet to effectively model the rich dynamic and conformational information of protein structures. To overcome these issues, we introduce (), a geometric pretraining framework that front-loads geometry learning prior to generative finetuning. Phase I (RigidSSL-Perturb) learns geometric…
Peer Reviews
Decision·ICLR 2026 Poster
Incorporating MD simulations into a pre-training step of protein design methods is an interesting and novel contribution.
The experiments demonstrate that RigidSSL-Perturb outperforms the baselines for designability and novelty, while RigidSSL-MD outperforms the baselines in diversity. However, RigidSSL-MD is not the methods of choice with respect to designability and novelty. These results limit the appilcability of the approach, as there seems no practical advantage of RigidSSL-MD over RigidSSL-Perturb, which is merely a simple data-augmentation of the data with Gaussian noise. The examples of generated structur
1. It utilizes the structural information available in large-scale protein datasets to pretrain a protein generation model in an unsupervised manner. 2. It achieves superior protein generation performance to the compared approaches.
1. This paper is more engineering-oriented. Though it achieves superior performance across several models on a protein generation benchmark, it seems to contain little new algorithms or architectures. Reference frame definition and flow matching are widely used across many areas, including, but not limited to, machine learning, computer vision, and computational biology. 2. The construction of two different conformation views is a little bit new, but the motivation for such a construction is un
The SE(3) rigidity pretraining for protein backbone generation is reasonable. Perturbations on rigidity align with the protein's natural conformational fluctuations, which can be interpreted as a masking-like paradigm. The MD snapshots used for pretraining are novel and interesting.
**W1. It is hard to determine whether the performance gains stem from the introduction of new data (e.g., AFDB, ATLAS) or the proposed rigidity-based geometric pretraining method. (My main concern)** The impact of RigidSSL-MD on diversity has been analyzed in lines 408–411 and Section 5, I think the improvements of diversity are attributed to the new data in the ATLAS dataset. For RigidSSL-Perturb, both FrameDiff and FoldFlow2 achieved improvements in designability and novelty. However, (1)
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReceptor Mechanisms and Signaling · Protein Structure and Dynamics · Machine Learning in Bioinformatics
