An efficient MPI/OpenMP parallelization of the Hartree-Fock method for the second generation of Intel Xeon Phi processor
Vladimir Mironov, Yuri Alexeev, Kristopher Keipert, Michael D'mello,, Alexander Moskovsky, Mark S. Gordon

TL;DR
This paper presents a hybrid MPI/OpenMP parallelization of the Hartree-Fock method in GAMESS, achieving significant speedup and memory reduction on Intel Xeon Phi supercomputers.
Contribution
It introduces a novel hybrid MPI/OpenMP implementation for Hartree-Fock calculations that improves performance and reduces memory usage on many-core architectures.
Findings
Up to sixfold speedup over the original code.
Memory footprint reduced by approximately 200 times.
Scalable performance on up to 192,000 cores.
Abstract
Modern OpenMP threading techniques are used to convert the MPI-only Hartree-Fock code in the GAMESS program to a hybrid MPI/OpenMP algorithm. Two separate implementations that differ by the sharing or replication of key data structures among threads are considered, density and Fock matrices. All implementations are benchmarked on a super-computer of 3,000 Intel Xeon Phi processors. With 64 cores per processor, scaling numbers are reported on up to 192,000 cores. The hybrid MPI/OpenMP implementation reduces the memory footprint by approximately 200 times compared to the legacy code. The MPI/OpenMP code was shown to run up to six times faster than the original for a range of molecular system sizes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
