A Case Study of LLVM-Based Analysis for Optimizing SIMD Code Generation
Joseph Huber, Weile Wei, Giorgis Georgakoudis, Johannes Doerfert,, Oscar Hernandez

TL;DR
This paper demonstrates how LLVM-based tools can be used to optimize SIMD code for the ARM A64FX processor, achieving nearly double the code speed and high FLOPS through manual and automated tuning techniques.
Contribution
The paper introduces a methodology for LLVM-based optimization of SIMD code targeting new architectures, including automation efforts with the OpenMP Advisor tool.
Findings
Code speed increased by 1.98X
Achieved 78 GFlops performance
Developed automated optimization approaches
Abstract
This paper presents a methodology for using LLVM-based tools to tune the DCA++ (dynamical clusterapproximation) application that targets the new ARM A64FX processor. The goal is to describethe changes required for the new architecture and generate efficient single instruction/multiple data(SIMD) instructions that target the new Scalable Vector Extension instruction set. During manualtuning, the authors used the LLVM tools to improve code parallelization by using OpenMP SIMD,refactored the code and applied transformation that enabled SIMD optimizations, and ensured thatthe correct libraries were used to achieve optimal performance. By applying these code changes, codespeed was increased by 1.98X and 78 GFlops were achieved on the A64FX processor. The authorsaim to automatize parts of the efforts in the OpenMP Advisor tool, which is built on top of existingand newly introduced LLVM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
