De novo protein design using geometric vector field networks
Weian Mao, Muzhi Zhu, Zheng Sun, Shuaike Shen, Lin Yuanbo Wu, Hao, Chen, Chunhua Shen

TL;DR
This paper introduces Vector Field Network (VFN), a novel encoder for de novo protein design that models residue frames and atoms more effectively, leading to improved performance in protein diffusion and inverse folding tasks.
Contribution
The paper proposes VFN, a universal encoder that performs learnable vector computations for better frame and atom modeling in protein design, surpassing existing methods.
Findings
VFN outperforms IPA in protein diffusion tasks (67.04% vs. 53.58%).
VFN achieves higher sequence recovery in inverse folding (54.7% vs. 51.66%).
VFN with ESM surpasses previous ESM-based models (62.67% vs. 55.65%).
Abstract
Innovations like protein diffusion have enabled significant progress in de novo protein design, which is a vital topic in life science. These methods typically depend on protein structure encoders to model residue backbone frames, where atoms do not exist. Most prior encoders rely on atom-wise features, such as angles and distances between atoms, which are not available in this context. Thus far, only several simple encoders, such as IPA, have been proposed for this scenario, exposing the frame modeling as a bottleneck. In this work, we proffer the Vector Field Network (VFN), which enables network layers to perform learnable vector computations between coordinates of frame-anchored virtual atoms, thus achieving a higher capability for modeling frames. The vector computation operates in a manner similar to a linear layer, with each input channel receiving 3D virtual atom coordinates…
Peer Reviews
Decision·ICLR 2024 spotlight
I agree with the authors that there has been an over-reliance on IPA in the literature for protein tasks. It makes a lot of sense to investigate improvements to it, so the paper does target a very important problem in my eyes. Introducing effectively more data channels into the model to increase its capacity is also very sensible. Importantly the model is shown to improve the results on the most common and important protein modeling tasks.
The general reasoning of why the proposed architecture works better and should be constructed the way it is lies on the concept of atom representation bottleneck. But this bottleneck is not really introduced or investigated in a rigorous manner. Maybe the authors can at least give concrete theoretical counter examples of what problem can be modeled with VFN but not IPA. At least an experimental ablation on varying the virtual node count would be interesting to see how the performance changes. I
The proposed model achieves a new SOTA score on the CATH 4.2 benchmark.
1. **The architecture design lacks some novelty:** It seems for the protein structure design part (VFN-Diff) borrows some ideas from FrameDiff, while for the inverse folding part (VFN-IF), the virtual atom is similar to that of PiFold and the node interaction (Equation 4, 5, 6) is similar to the node gating mechanism in PiFold. 2. **The problem setting is unfair:** In the first paragraph of section 4, the author mentioned "In the protein diffusion part, the protein structure is designed and rep
* The paper establishes a new entry in the design space of residue frame-based architectures, an exciting direction for protein representation learning. * The experimental results are quite strong and establish that VFN could be used as a drop-in replacement for alternative SOTA architectures. * The non-exchangeable treatment of virtual atoms (unlike IPA) leaves open the possibility of using the framework for real sidechain atoms.
* The thesis of the paper would be improved by better contextualization relative to IPA. The authors should not shy away from acknowledging significant similarities, but highlight the key changes and the insights behind them. I would suggest a side-by-side algorithmic comparison. * The paper could be further strengthened by additional comparisons with IPA. Particularly, if we replace IPA in AlphaFold/ESMFold with VFN, does the performance persist? It should not be too hard to run this experiment
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Machine Learning in Bioinformatics · Genomics and Chromatin Dynamics
MethodsDiffusion
