An alternative GPU acceleration for a pseudopotential plane-waves density functional theory code with applications to metallic systems
Xuejun Gong, Andrea Dal Corso

TL;DR
This paper introduces a novel GPU acceleration method for pseudopotential plane-waves density functional theory calculations, optimized for metallic systems with many k points, demonstrating significant performance improvements over existing CPU and GPU-accelerated approaches.
Contribution
The authors develop a new GPU implementation in CUDA Fortran that parallelizes Hamiltonian application across wave-vectors, enhancing efficiency for metallic systems with small unit cells.
Findings
Significant speed-up over CPU calculations.
Improved performance compared to existing GPU-accelerated methods.
Effective parallelization of Hamiltonian application on GPU threads.
Abstract
We present an alternative GPU acceleration for plane waves pseudopotentials electronic structure codes designed for systems that have small unit cells but require a large number of k points to sample the Brillouin zone as happens, for instance, in metals. We discuss the diagonalization of the Kohn and Sham equations and the solution of the linear system derived in density functional perturbation theory. Both problems take advantage from a rewriting of the routine that applies the Hamiltonian to the Bloch wave-functions to work simultaneously (in parallel on the GPU threads) on the wave-functions with different wave-vectors k, as many as allowed by the GPU memory. Our implementation is written in CUDA Fortran and makes extensive use of kernel routines that run on the GPU (GLOBAL routines) or can be called from inside the GPU threads (DEVICE routines). We compare our method with the CPUs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
