TL;DR
This paper evaluates the performance of different compiler suites on the A64FX CPU used in supercomputers, revealing significant performance gains when deviating from standard usage models.
Contribution
It provides a comparative analysis of compiler performance on A64FX and demonstrates potential optimizations for better HPC performance.
Findings
Significant performance improvements achieved through alternative compiler configurations.
Compiler choice greatly impacts HPC application efficiency on A64FX.
Deviating from recommended usage models can unlock higher performance.
Abstract
The current number one of the TOP500 list, Supercomputer Fugaku, has demonstrated that CPU-only HPC systems aren't dead and CPUs can be used for more than just being the host controller for a discrete accelerators. While the specifications of the chip and overall system architecture, and benchmarks submitted to various lists, like TOP500 and Green500, etc., are clearly highlighting the potential, the proliferation of Arm into the HPC business is rather recent and hence the software stack might not be fully matured and tuned, yet. We test three state-of-the-art compiler suite against a broad set of benchmarks. Our measurements show that orders of magnitudes in performance can be gained by deviating from the recommended usage model of the A64FX compute nodes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
