HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle
Guoxia Wang, Xiaomin Fang, Zhihua Wu, Yiqun Liu, Yang Xue, Yingfei, Xiang, Dianhai Yu, Fan Wang, Yanjun Ma

TL;DR
HelixFold is an optimized implementation of AlphaFold2 using PaddlePaddle that significantly reduces training time and memory usage while maintaining high accuracy, making protein structure prediction more accessible.
Contribution
This work introduces HelixFold, a PaddlePaddle-based implementation of AlphaFold2 that improves efficiency and reduces resource requirements compared to existing versions.
Findings
HelixFold trains in 7.5 days, faster than 11 days for AlphaFold2 and OpenFold.
HelixFold maintains comparable accuracy to AlphaFold2 on CASP14 and CAMEO datasets.
Memory consumption is optimized through various techniques, enabling more accessible protein structure prediction.
Abstract
Accurate protein structure prediction can significantly accelerate the development of life science. The accuracy of AlphaFold2, a frontier end-to-end structure prediction system, is already close to that of the experimental determination techniques. Due to the complex model architecture and large memory consumption, it requires lots of computational resources and time to implement the training and inference of AlphaFold2 from scratch. The cost of running the original AlphaFold2 is expensive for most individuals and institutions. Therefore, reducing this cost could accelerate the development of life science. We implement AlphaFold2 using PaddlePaddle, namely HelixFold, to improve training and inference speed and reduce memory consumption. The performance is improved by operator fusion, tensor fusion, and hybrid parallelism computation, while the memory is optimized through Recompute,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Protein Structure and Dynamics · Enzyme Structure and Function
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
