Performance optimizations for scalable CFD applications on hybrid CPU+MIC heterogeneous computing system with millions of cores
Yong-Xian Wang, Li-Lun Zhang, Wei Liu, Xing-Hua Cheng, Yu Zhuang,, Anthony T. Chronopoulos

TL;DR
This paper presents performance optimization techniques for large-scale CFD applications on hybrid CPU+MIC supercomputers, achieving near-linear scaling on systems with over a million cores.
Contribution
It introduces new parallelization and tuning strategies for CFD codes on heterogeneous CPU+MIC systems, enabling efficient utilization of massive core counts.
Findings
Successfully scaled CFD simulations to 780 billion grid cells.
Achieved near-linear scaling on Tianhe-2 with over 1.3 million cores.
Demonstrated effective load balancing and communication optimization.
Abstract
For computational fluid dynamics (CFD) applications with a large number of grid points/cells, parallel computing is a common efficient strategy to reduce the computational time. How to achieve the best performance in the modern supercomputer system, especially with heterogeneous computing resources such as hybrid CPU+GPU, or a CPU + Intel Xeon Phi (MIC) co-processors, is still a great challenge. An in-house parallel CFD code capable of simulating three dimensional structured grid applications is developed and tested in this study. Several methods of parallelization, performance optimization and code tuning both in the CPU-only homogeneous system and in the heterogeneous system are proposed based on identifying potential parallelism of applications, balancing the work load among all kinds of computing devices, tuning the multi-thread code toward better performance in intra-machine node…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
