Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles

Zhanghan Ni; Yanjing Li; Zeju Qiu; Bernhard Sch\"olkopf; Hongyu Guo; Weiyang Liu; Shengchao Liu

arXiv:2603.02406·cs.LG·March 9, 2026

Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles

Zhanghan Ni, Yanjing Li, Zeju Qiu, Bernhard Sch\"olkopf, Hongyu Guo, Weiyang Liu, Shengchao Liu

PDF

Open Access 3 Reviews

TL;DR

This paper introduces RigidSSL, a geometric pretraining framework for protein design that learns global structural priors and conformational dynamics, significantly improving generative modeling and downstream tasks.

Contribution

The paper presents RigidSSL, a novel two-phase geometric pretraining method that jointly models protein structure and dynamics, addressing limitations of local representations and static modeling.

Findings

01

Improves designability by up to 43% in protein generation

02

Enhances success rate by 5.8% in zero-shot motif scaffolding

03

Captures realistic conformational ensembles in GPCR modeling

Abstract

Generative models have recently advanced $de novo$ protein design by learning the statistical regularities of natural structures. However, current approaches face three key limitations: (1) Existing methods cannot jointly learn protein geometry and design tasks, where pretraining can be a solution; (2) Current pretraining methods mostly rely on local, non-rigid atomic representations for property prediction downstream tasks, limiting global geometric understanding for protein generation tasks; and (3) Existing approaches have yet to effectively model the rich dynamic and conformational information of protein structures. To overcome these issues, we introduce $RigidSSL$ ( $Rigidity-Aware Self-Supervised Learning$ ), a geometric pretraining framework that front-loads geometry learning prior to generative finetuning. Phase I (RigidSSL-Perturb) learns geometric…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 2Confidence 4

Strengths

Incorporating MD simulations into a pre-training step of protein design methods is an interesting and novel contribution.

Weaknesses

The experiments demonstrate that RigidSSL-Perturb outperforms the baselines for designability and novelty, while RigidSSL-MD outperforms the baselines in diversity. However, RigidSSL-MD is not the methods of choice with respect to designability and novelty. These results limit the appilcability of the approach, as there seems no practical advantage of RigidSSL-MD over RigidSSL-Perturb, which is merely a simple data-augmentation of the data with Gaussian noise. The examples of generated structur

Reviewer 02Rating 2Confidence 4

Strengths

1. It utilizes the structural information available in large-scale protein datasets to pretrain a protein generation model in an unsupervised manner. 2. It achieves superior protein generation performance to the compared approaches.

Weaknesses

1. This paper is more engineering-oriented. Though it achieves superior performance across several models on a protein generation benchmark, it seems to contain little new algorithms or architectures. Reference frame definition and flow matching are widely used across many areas, including, but not limited to, machine learning, computer vision, and computational biology. 2. The construction of two different conformation views is a little bit new, but the motivation for such a construction is un

Reviewer 03Rating 2Confidence 3

Strengths

The SE(3) rigidity pretraining for protein backbone generation is reasonable. Perturbations on rigidity align with the protein's natural conformational fluctuations, which can be interpreted as a masking-like paradigm. The MD snapshots used for pretraining are novel and interesting.

Weaknesses

**W1. It is hard to determine whether the performance gains stem from the introduction of new data (e.g., AFDB, ATLAS) or the proposed rigidity-based geometric pretraining method. (My main concern)** The impact of RigidSSL-MD on diversity has been analyzed in lines 408–411 and Section 5, I think the improvements of diversity are attributed to the new data in the ATLAS dataset. For RigidSSL-Perturb, both FrameDiff and FoldFlow2 achieved improvements in designability and novelty. However, (1)

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReceptor Mechanisms and Signaling · Protein Structure and Dynamics · Machine Learning in Bioinformatics