TL;DR
This paper introduces a new parallelized formulation of ab initio DMRG algorithms optimized for high-performance computing, demonstrating near-ideal scaling on large benchmarks.
Contribution
It presents a reformulation connecting DMRG to sum of sub-Hamiltonians and explores multiple parallelism strategies for improved efficiency.
Findings
Near-ideal parallel scaling from 448 to 2800 CPU cores.
Efficient reduction of load imbalance and communication costs.
Successful application to large benchmark systems like benzene and FeMo cofactor.
Abstract
There has been recent interest in the deployment of ab initio density matrix renormalization group computations on high performance computing platforms. Here, we introduce a reformulation of the conventional distributed memory ab initio DMRG algorithm that connects it to the conceptually simpler and advantageous sum of sub-Hamiltonians approach. Starting from this framework, we further explore a hierarchy of parallelism strategies, that includes (i) parallelism over the sum of sub-Hamiltonians, (ii) parallelism over sites, (iii) parallelism over normal and complementary operators, (iv) parallelism over symmetry sectors, and (v) parallelism over dense matrix multiplications. We describe how to reduce processor load imbalance and the communication cost of the algorithm to achieve higher efficiencies. We illustrate the performance of our new open-source implementation on a recent benchmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
