GPU Optimization of Lattice Boltzmann Method with Local Ensemble   Transform Kalman Filter

Yuta Hasegawa; Toshiyuki Imamura; Takuya Ina; Naoyuki Onodera; Yuuichi; Asahi; Yasuhiro Idomura

arXiv:2308.03310·physics.flu-dyn·August 8, 2023

GPU Optimization of Lattice Boltzmann Method with Local Ensemble Transform Kalman Filter

Yuta Hasegawa, Toshiyuki Imamura, Takuya Ina, Naoyuki Onodera, Yuuichi, Asahi, Yasuhiro Idomura

PDF

TL;DR

This paper presents GPU-optimized implementation of lattice Boltzmann method combined with local ensemble transform Kalman filter for fluid dynamics data assimilation, achieving significant speedups through optimized communication and batched eigenvalue decomposition.

Contribution

The paper introduces a GPU-accelerated approach for LBM and LETKF integration, including a novel batched eigenvalue decomposition that outperforms existing libraries.

Findings

01

Achieved 3.80x speedup over naive implementation.

02

Developed a batched EVD in EigenG outperforming cuSOLVER.

03

Demonstrated efficient data communication overlapping on GPU.

Abstract

The ensemble data assimilation of computational fluid dynamics simulations based on the lattice Boltzmann method (LBM) and the local ensemble transform Kalman filter (LETKF) is implemented and optimized on a GPU supercomputer based on NVIDIA A100 GPUs. To connect the LBM and LETKF parts, data transpose communication is optimized by overlapping computation, file I/O, and communication based on data dependency in each LETKF kernel. In two dimensional forced isotropic turbulence simulations with the ensemble size of $M = 64$ and the number of grid points of $N_{x} = 12 8^{2}$ , the optimized implementation achieved $\times 3.80$ speedup from the naive implementation, in which the LETKF part is not parallelized. The main computing kernel of the local problem is the eigenvalue decomposition (EVD) of $M \times M$ real symmetric dense matrices, which is computed by a newly developed batched EVD in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.