An efficient mixed-precision, hybrid CPU-GPU implementation of a fully   implicit particle-in-cell algorithm

Guangye Chen; Luis Chac\'on; Daniel C. Barnes

arXiv:1111.5295·physics.plasm-ph·June 3, 2015

An efficient mixed-precision, hybrid CPU-GPU implementation of a fully implicit particle-in-cell algorithm

Guangye Chen, Luis Chac\'on, Daniel C. Barnes

PDF

TL;DR

This paper presents a highly efficient mixed-precision hybrid CPU-GPU implementation of a fully implicit particle-in-cell algorithm that significantly accelerates simulations while maintaining accuracy and robustness.

Contribution

It introduces a novel mixed-precision hybrid CPU-GPU implementation of an implicit PIC algorithm, achieving high performance and efficiency in large-scale kinetic simulations.

Findings

01

GPU implementation achieves up to 400 GOp/s

02

Performance is 300 times faster than serial CPU execution

03

Hybrid solver outperforms CPU-only version by a factor of 100

Abstract

Recently, a fully implicit, energy- and charge-conserving particle-in-cell method has been proposed for multi-scale, full-f kinetic simulations [G. Chen, et al., J. Comput. Phys. 230,18 (2011)]. The method employs a Jacobian-free Newton-Krylov (JFNK) solver, capable of using very large timesteps without loss of numerical stability or accuracy. A fundamental feature of the method is the segregation of particle-orbit computations from the field solver, while remaining fully self-consistent. This paper describes a very efficient, mixed-precision hybrid CPU-GPU implementation of the implicit PIC algorithm exploiting this feature. The JFNK solver is kept on the CPU in double precision (DP), while the implicit, charge-conserving, and adaptive particle mover is implemented on a GPU (graphics processing unit) using CUDA in single-precision (SP). Performance-oriented optimizations are introduced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.