TL;DR
This paper presents a distributed Trotter-Suzuki solver for the Schrödinger equation that leverages hybrid CPU-GPU kernels, achieving near-linear scaling and improved performance for large quantum systems on clusters.
Contribution
It introduces a hybrid kernel combining multicore CPUs and GPUs, enhancing efficiency for large matrices that exceed GPU memory, and demonstrates scalable performance on distributed systems.
Findings
Near-linear scaling with CPU kernels
GPU kernel efficiency increases with larger matrices
Hybrid kernel improves performance for large quantum systems
Abstract
The Trotter-Suzuki approximation leads to an efficient algorithm for solving the time-dependent Schr\"odinger equation. Using existing highly optimized CPU and GPU kernels, we developed a distributed version of the algorithm that runs efficiently on a cluster. Our implementation also improves single node performance, and is able to use multiple GPUs within a node. The scaling is close to linear using the CPU kernels, whereas the efficiency of GPU kernels improve with larger matrices. We also introduce a hybrid kernel that simultaneously uses multicore CPUs and GPUs in a distributed system. This kernel is shown to be efficient when the matrix size would not fit in the GPU memory. Larger quantum systems scale especially well with a high number nodes. The code is available under an open source license.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
