Deep Reinforcement Learning for Modelling Protein Complexes

Ziqi Gao; Tao Feng; Jiaxuan You; Chenyi Zi; Yan Zhou; Chen Zhang; Jia; Li

arXiv:2405.02299·cs.CE·May 8, 2024

Deep Reinforcement Learning for Modelling Protein Complexes

Ziqi Gao, Tao Feng, Jiaxuan You, Chenyi Zi, Yan Zhou, Chen Zhang, Jia, Li

PDF

Open Access 3 Reviews

TL;DR

This paper introduces GAPN, a novel deep reinforcement learning approach that models multi-chain protein complexes efficiently, addressing combinatorial challenges and scale variance, resulting in improved accuracy and computational performance.

Contribution

The work presents a new reinforcement learning framework, GAPN, for protein complex modeling that effectively handles large search spaces and scale variability.

Findings

01

Achieved significant accuracy improvements in RMSD and TM-Score.

02

Demonstrated enhanced computational efficiency over existing PCM methods.

03

Successfully modeled diverse protein complexes with varying chain numbers.

Abstract

AlphaFold can be used for both single-chain and multi-chain protein structure prediction, while the latter becomes extremely challenging as the number of chains increases. In this work, by taking each chain as a node and assembly actions as edges, we show that an acyclic undirected connected graph can be used to predict the structure of multi-chain protein complexes (a.k.a., protein complex modelling, PCM). However, there are still two challenges: 1) The huge combinatorial optimization space of $N^{N - 2}$ ( $N$ is the number of chains) for the PCM problem can easily lead to high computational cost. 2) The scales of protein complexes exhibit distribution shift due to variance in chain numbers, which calls for the generalization in modelling complexes of various scales. To address these challenges, we propose GAPN, a Generative Adversarial Policy Network powered by domain-specific rewards…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

This work integrates reinforcement learning, graph neural networks, and adversarial training to tackle the challenging problem of protein complex modeling, which is novel. In detail, the authors identify the key issues of huge search space and lack of generalization across protein complexes of different sizes. The proposed GAPN framework well addresses these challenges through policy-based active search and an adversarial reward function that encodes global assembly knowledge. The graph represen

Weaknesses

It would be more helpful and intuitive to provide the assembly process of GAPN and MoLPC for the examples shown in Figure 4. For the efficiency analysis, it would be better to also theoretically analyze the exploration complexity and empirically analyze the relationship between efficiency and chain number N.

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

# Problem domain * The paper is an important problem domain: protein folding. Accurate protein folding could imply faster discovery of drugs, especially given the "virus era" erupted by COVID-19. * The problem comes with some interesting challenges, specifically, models should be in(/equi)variant to rotation and translation. # Enabling folding of larger proteins In my understanding (per paper text), the earlier methods either take too long to simulate folding for larger proteins (e.g., >9 chai

Weaknesses

# Missing primer / prelim It would be nice if you give some 4-line summary of terminology (even if brief). The ICLR audience might not be familiar with concepts like "docking" and "dimer". # Adversarial reward Eq.3 The motivation and implementation of Eq.3 are ambiguous. It says that $p_{data}(x)$ is "the underlying distribution of ground-truth assembly action set". Can you give details to the support of the distribution? Is it actually "pairs of amino acid indices"? Do you even have that info

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

1. The paper is well-organized and easy to follow. 2. Experimental results demonstrates the effectiveness of proposed GAPN in both prediction accuracy and efficacy.

Weaknesses

This idea is not new, and the models they used are all well established.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProtein Structure and Dynamics · Microbial Metabolic Engineering and Bioproduction · Viral Infectious Diseases and Gene Expression in Insects

MethodsFocus