Heterogeneous computing in a strongly-connected CPU-GPU environment: fast multiple time-evolution equation-based modeling accelerated using data-driven approach
Tsuyoshi Ichimura, Kohei Fujita, Muneo Hori, Lalith Maddegedara, Jack, Wells, Alan Gray, Ian Karlin, John Linford

TL;DR
This paper introduces a CPU-GPU heterogeneous computing approach for efficiently solving time-evolution PDEs, achieving significant speedups and energy savings while maintaining accuracy, demonstrated on a single node and a supercomputer.
Contribution
The paper presents a novel directive-based heterogeneous computing method that accelerates PDE solutions with guaranteed accuracy and high scalability in CPU-GPU environments.
Findings
86.4x speedup on CPU-GPU node compared to CPU-only
32.2-fold energy reduction over CPU-only
94.3% weak scaling efficiency on supercomputer
Abstract
We propose a CPU-GPU heterogeneous computing method for solving time-evolution partial differential equation problems many times with guaranteed accuracy, in short time-to-solution and low energy-to-solution. On a single-GH200 node, the proposed method improved the computation speed by 86.4 and 8.67 times compared to the conventional method run only on CPU and only on GPU, respectively. Furthermore, the energy-to-solution was reduced by 32.2-fold (from 9944 J to 309 J) and 7.01-fold (from 2163 J to 309 J) when compared to using only the CPU and GPU, respectively. Using the proposed method on the Alps supercomputer, a 51.6-fold and 6.98-fold speedup was attained when compared to using only the CPU and GPU, respectively, and a high weak scaling efficiency of 94.3% was obtained up to 1,920 compute nodes. These implementations were realized using directive-based parallel programming models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Neural Networks and Applications · Parallel Computing and Optimization Techniques
