Embracing a new era of highly efficient and productive quantum Monte   Carlo simulations

Amrita Mathuriya; Ye Luo; Raymond C. Clay III; Anouar Benali; Luke; Shulenburger; Jeongnim Kim

arXiv:1708.02645·cs.DC·August 10, 2017

Embracing a new era of highly efficient and productive quantum Monte Carlo simulations

Amrita Mathuriya, Ye Luo, Raymond C. Clay III, Anouar Benali, Luke, Shulenburger, Jeongnim Kim

PDF

TL;DR

This paper details a systematic transformation of QMCPACK to leverage modern CPU hardware, achieving significant speedups, energy savings, and reduced memory usage, thereby enabling larger and more efficient quantum Monte Carlo simulations.

Contribution

The paper introduces a portable, maintainable approach to optimize QMCPACK for modern hardware, including new data layouts and vectorization techniques.

Findings

01

Achieved up to 4.5x speedup on modern CPUs.

02

Reduced energy consumption proportionally to speedup.

03

Memory footprint decreased by up to 3.8x.

Abstract

QMCPACK has enabled cutting-edge materials research on supercomputers for over a decade. It scales nearly ideally but has low single-node efficiency due to the physics-based abstractions using array-of-structures objects, causing inefficient vectorization. We present a systematic approach to transform QMCPACK to better exploit the new hardware features of modern CPUs in portable and maintainable ways. We develop miniapps for fast prototyping and optimizations. We implement new containers in structure-of-arrays data layout to facilitate vectorizations by the compilers. Further speedup and smaller memory-footprints are obtained by computing data on the fly with the vectorized routines and expanding single-precision use. All these are seamlessly incorporated in production QMCPACK. We demonstrate upto 4.5x speedups on recent Intel processors and IBM Blue Gene/Q for representative workloads.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.