# Massively parallel lattice-Boltzmann codes on large GPU clusters

**Authors:** E. Calore, A. Gabbana, J. Kraus, E. Pellegrini, S.F. Schifano, R., Tripiccione

arXiv: 1703.00185 · 2017-03-02

## TL;DR

This paper presents a highly optimized, scalable GPU-based lattice-Boltzmann code for thermal fluid simulations, demonstrating high performance and providing a methodology for developing efficient HPC applications.

## Contribution

The paper introduces a new massively parallel GPU code for thermal lattice-Boltzmann simulations, with detailed performance analysis and optimization strategies for large GPU clusters.

## Key findings

- Achieves several tens of Tflops in performance
- Demonstrates good scaling on large GPU clusters
- Provides a performance modeling and optimization methodology

## Abstract

This paper describes a massively parallel code for a state-of-the art thermal lattice- Boltzmann method. Our code has been carefully optimized for performance on one GPU and to have a good scaling behavior extending to a large number of GPUs. Versions of this code have been already used for large-scale studies of convective turbulence. GPUs are becoming increasingly popular in HPC applications, as they are able to deliver higher performance than traditional processors. Writing efficient programs for large clusters is not an easy task as codes must adapt to increasingly parallel architectures, and the overheads of node-to-node communications must be properly handled. We describe the structure of our code, discussing several key design choices that were guided by theoretical models of performance and experimental benchmarks. We present an extensive set of performance measurements and identify the corresponding main bot- tlenecks; finally we compare the results of our GPU code with those measured on other currently available high performance processors. Our results are a production-grade code able to deliver a sustained performance of several tens of Tflops as well as a design and op- timization methodology that can be used for the development of other high performance applications for computational physics.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.00185/full.md

## Figures

30 figures with captions in the complete paper: https://tomesphere.com/paper/1703.00185/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/1703.00185/full.md

---
Source: https://tomesphere.com/paper/1703.00185