ParaFold: Paralleling AlphaFold for Large-Scale Predictions
Bozitao Zhong, Xiaoming Su, Minhua Wen, Sichen Zuo, Liang Hong and, James Lin

TL;DR
ParaFold is a parallelized, high-throughput version of AlphaFold that significantly accelerates large-scale protein structure predictions by optimizing CPU and GPU workflows, enabling rapid, cost-effective structural genomics research.
Contribution
It introduces a parallel framework separating CPU and GPU tasks, with optimizations like multi-threading and JAX compilation, to enhance the scalability and speed of AlphaFold predictions.
Findings
Achieved 13.8X speedup over AlphaFold with JAX optimization.
Predicted structures for 19,704 proteins in five hours on a single DGX-2.
Maintained prediction accuracy while significantly reducing runtime.
Abstract
AlphaFold predicts protein structures from the amino acid sequence at or near experimental resolution, solving the 50-year-old protein folding challenge, leading to progress by transforming large-scale genomics data into protein structures. AlphaFold will also greatly change the scientific research model from low-throughput to high-throughput manner. The AlphaFold framework is a mixture of two types of workloads: MSA construction based on CPUs and model inference on GPUs. The first CPU stage dominates the overall runtime, taking hours for a single protein due to the large database sizes and I/O bottlenecks. However, GPUs in this CPU stage remain idle, resulting in low GPU utilization and restricting the capacity of large-scale structure predictions. Therefore, we proposed ParaFold, an open-source parallel version of AlphaFold for high throughput protein structure predictions. ParaFold…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Machine Learning in Bioinformatics · Parallel Computing and Optimization Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · AlphaFold
