A High-Performance Design for Hierarchical Parallelism in the QMCPACK   Monte Carlo code

Ye Luo; Peter Doak; Paul Kent

arXiv:2209.14487·physics.comp-ph·April 19, 2023·HiPar@SC·1 cites

A High-Performance Design for Hierarchical Parallelism in the QMCPACK Monte Carlo code

Ye Luo, Peter Doak, Paul Kent

PDF

Open Access

TL;DR

This paper presents a new hierarchical parallelism design for QMCPACK that enhances GPU utilization, improves performance across hardware, and simplifies code maintenance, thereby boosting scientific productivity.

Contribution

The paper introduces a novel parallelism architecture for QMCPACK that better exploits hierarchical hardware, including GPUs and CPUs, with fallback support and improved efficiency.

Findings

01

Higher GPU occupancy with crowds of Monte Carlo walkers

02

Enhanced performance across heterogeneous architectures

03

Support for fallback to CPU execution

Abstract

We introduce a new high-performance design for parallelism within the Quantum Monte Carlo code QMCPACK. We demonstrate that the new design is better able to exploit the hierarchical parallelism of heterogeneous architectures compared to the previous GPU implementation. The new version is able to achieve higher GPU occupancy via the new concept of crowds of Monte Carlo walkers, and by enabling more host CPU threads to effectively offload to the GPU. The higher performance is expected to be achieved independent of the underlying hardware, significantly improving developer productivity and reducing code maintenance costs. Scientific productivity is also improved with full support for fallback to CPU execution when GPU implementations are not available or CPU execution is more optimal.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Distributed and Parallel Computing Systems