Cluster-level tuning of a shallow water equation solver on the Intel MIC architecture
Andrey Vladimirov, Cliff Addison

TL;DR
This paper presents optimization techniques for a shallow water equation solver on Intel MIC architecture, achieving significant performance improvements through tuning OpenMP+MPI parameters and communication strategies.
Contribution
It introduces a systematic approach to optimize hybrid OpenMP+MPI CFD code on Intel Xeon Phi, demonstrating substantial scalability and performance gains with minimal code changes.
Findings
90% parallel efficiency up to 8 coprocessors
1.6x performance increase over CPU on a single MIC
5.8x performance gain in multi-node cluster
Abstract
The paper demonstrates the optimization of the execution environment of a hybrid OpenMP+MPI computational fluid dynamics code (shallow water equation solver) on a cluster enabled with Intel Xeon Phi coprocessors. The discussion includes: (1) Controlling the number and affinity of OpenMP threads to optimize access to memory bandwidth; (2) Tuning the inter-operation of OpenMP and MPI to partition the problem for better data locality; (3) Ordering the MPI ranks in a way that directs some of the traffic into faster communication channels; (4) Using efficient peer-to-peer communication between Xeon Phi coprocessors based on the InfiniBand fabric. With tuning, the application has 90% percent efficiency of parallel scaling up to 8 Intel Xeon Phi coprocessors in 2 compute nodes. For larger problems, scalability is even better, because of the greater computation to communication ratio.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMeteorological Phenomena and Simulations · Computational Fluid Dynamics and Aerodynamics · Tropical and Extratropical Cyclones Research
