Chip-level and multi-node analysis of energy-optimized lattice-Boltzmann CFD simulations
Markus Wittmann, Georg Hager, Thomas Zeiser, Jan Treibig, Gerhard, Wellein

TL;DR
This paper analyzes energy-efficient lattice-Boltzmann CFD simulations at chip and multi-node levels, identifying key optimization strategies for reducing energy consumption while maintaining performance.
Contribution
It provides a detailed performance and energy analysis of LBM on multicore processors, offering guidelines for optimizing energy efficiency in parallel CFD simulations.
Findings
High single-core performance is crucial for energy efficiency.
Optimal number of active cores minimizes energy to solution.
Energy savings of up to 35% achieved through targeted optimizations.
Abstract
Memory-bound algorithms show complex performance and energy consumption behavior on multicore processors. We choose the lattice-Boltzmann method (LBM) on an Intel Sandy Bridge cluster as a prototype scenario to investigate if and how single-chip performance and power characteristics can be generalized to the highly parallel case. First we perform an analysis of a sparse-lattice LBM implementation for complex geometries. Using a single-core performance model, we predict the intra-chip saturation characteristics and the optimal operating point in terms of energy to solution as a function of implementation details, clock frequency, vectorization, and number of active cores per chip. We show that high single-core performance and a correct choice of the number of active cores per chip are the essential optimizations for lowest energy to solution at minimal performance degradation. Then we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
