Accelerating Pythonic coupled cluster implementations: a comparison between CPUs and GPUs
Maximilian H. Kriebel, Pawe{\l} Tecmer, Marta Ga{\l}y\'nska and, Aleksandra Leszczyk, Katharina Boguslawski

TL;DR
This paper demonstrates that offloading tensor contractions in Pythonic coupled cluster calculations to GPUs using CuPy significantly accelerates computations, achieving up to 16 times speed-up compared to CPU-only implementations.
Contribution
It provides a detailed comparison of CPU and GPU implementations for coupled cluster methods, highlighting the performance gains of GPU acceleration with CuPy in electronic structure calculations.
Findings
GPU implementation with CuPy is up to 10 times faster than CPU.
Benchmarking identifies optimal routines for tensor contractions.
Hybrid CPU-GPU approach achieves up to 16 times speed-up.
Abstract
We scrutinize how to accelerate the bottleneck operations of Pythonic coupled cluster implementations performed on a \texttt{NVIDIA} Tesla V100S PCIe 32GB (rev 1a) Graphics Processing Unit (GPU). The \texttt{NVIDIA} Compute Unified Device Architecture (CUDA) API is interacted with via \texttt{CuPy}, an open-source library for Python, designed as a \texttt{NumPy} drop-in replacement for GPUs. The implementation uses the Cholesky linear algebra domain and is done in {PyBEST}, the Pythonic Black-box Electronic Structure Tool -- a fully-fledged modern electronic structure software package. Due to the limitations of Video Memory (VRAM), the GPU calculations must be performed batch-wise. Timing results of some contractions containing large tensors are presented. The \texttt{CuPy} implementation leads to factor 10 speed-up compared to calculations on 36 CPUs. Furthermore, we benchmark several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Physics of Superconductivity and Magnetism · Computational Physics and Python Applications
