Solving Inverse Problems in Protein Space Using Diffusion-Based Priors

Axel Levy; Eric R. Chan; Sara Fridovich-Keil; Fr\'ed\'eric Poitevin,; Ellen D. Zhong; Gordon Wetzstein

arXiv:2406.04239·cs.LG·April 24, 2025

Solving Inverse Problems in Protein Space Using Diffusion-Based Priors

Axel Levy, Eric R. Chan, Sara Fridovich-Keil, Fr\'ed\'eric Poitevin,, Ellen D. Zhong, Gordon Wetzstein

PDF

Open Access 3 Reviews

TL;DR

This paper presents a versatile diffusion-based framework that transforms biophysical measurements like cryo-EM maps into detailed 3D protein structures, outperforming existing methods and enabling new applications.

Contribution

It introduces the first diffusion-based approach for refining atomic models from cryo-EM data and constructing structures from sparse distance matrices, integrating physics-based models with pretrained generative priors.

Findings

01

Outperforms posterior sampling baselines on inverse problems

02

First diffusion-based method for cryo-EM atomic model refinement

03

Enables structure building from sparse distance matrices

Abstract

The interaction of a protein with its environment can be understood and controlled via its 3D structure. Experimental methods for protein structure determination, such as X-ray crystallography or cryogenic electron microscopy, shed light on biological processes but introduce challenging inverse problems. Learning-based approaches have emerged as accurate and efficient methods to solve these inverse problems for 3D structure determination, but are specialized for a predefined type of measurement. Here, we introduce a versatile framework to turn biophysical measurements, such as cryo-EM density maps, into 3D atomic models. Our method combines a physics-based forward model of the measurement process with a pretrained generative model providing a task-agnostic, data-driven prior. Our method outperforms posterior sampling baselines on linear and non-linear inverse problems. In particular, it…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 4

Strengths

As far as my knowledge goes, this work is one of the first to leverage a pretrained diffusion model of protein atomic structures as a Bayes prior, to solve the inverse problems which are very common in structural biology. The connection between the generative model and the experimental observation is very important in expanding the scope of the AI for science field.

Weaknesses

1. Since the paper uses pretrained diffusion models from Chroma and RFDiffusion, and the measurement models for the tasks are apparent, from a methodological standpoint, my understanding is that the main contribution of this paper is the MAP estimation method given a diffusion prior. As ADP-3D seems like a generic algorithm that is not heavily tailored for structural biology, there are many similar methods in the field of diffusion posterior sampling for inverse problems, e.g. DPS, $\Pi$GDM, as

Reviewer 02Rating 3Confidence 4

Strengths

* A general approach to combining pretrained diffusion models with data based on a variable-splitting framework. * The resulting algorithm is quite simple and allows for combining pretrained diffusion models with new data (thereby avoiding a retraining step). * The approach is illustrated for various protein structure modeling tasks.

Weaknesses

* The general approach (algorithm 1) seems to be a minor modification of existing work. * The method is illustrated only for simulated data. __Recommendation__ I recommend to reject the article. ADP-3D is a straight forward application of the plug-n-play framework to protein diffusion models. The idea of using a variable splitting approach to combine diffusion models with additional data has already been proposed by Zhu et al. (2023) in the context of image restoration. So the major novelty is

Reviewer 03Rating 8Confidence 3

Strengths

- It is very impressive to see the results for all three inverse problems. Some aspects that are highlights to me: - In the structure completion it is great that the authors also test this for different levels of incompleteness. For these methods, it is good to showcase limitations to the reader - I am very impressed by outperforming ModelAngelo, which is as the authors mention (and to the best of my knowledge) the state-of-the-art in model building. Being able to refine the predictions

Weaknesses

- The writing can be improved upon. In particular, the introduction and the related work could really benefit from restructuring and adding additional material. This would benefit the overall flow of the paper and provide the reader with a clearer overview of the problems being tackled in the paper. Detailed suggestions below: - Regarding the introduction: - Please allow me to paraphrase: - Paragraph 1: Proteins can be inferred in different ways through solving an inverse

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProtein Structure and Dynamics