Comparison of Vectorization Capabilities of Different Compilers for X86 and ARM CPUs
Nazmus Sakib, Tarun Prabhu, Nandakishore Santhi, John Shalf,, Abdel-Hameed A. Badawy

TL;DR
This study compares the vectorization capabilities of various compilers on x86 and ARM CPUs using a modified real-world code suite, revealing inconsistent performance and no clear winner among compilers.
Contribution
It provides a comparative analysis of compiler vectorization effectiveness on x86 and ARM architectures using a realistic benchmark suite.
Findings
GCC vectorized 54% (x86) and 56% (ARM) of loops
ICX and ACFL reported similar vectorization rates (~50-54%)
No single compiler consistently outperformed others across platforms
Abstract
Most modern processors contain vector units that simultaneously perform the same arithmetic operation over multiple sets of operands. The ability of compilers to automatically vectorize code is critical to effectively using these units. Understanding this capability is important for anyone writing compute-intensive, high-performance, and portable code. We tested the ability of several compilers to vectorize code on x86 and ARM. We used the TSVC2 suite, with modifications that made it more representative of real-world code. On x86, GCC reported 54% of the loops in the suite as having been vectorized, ICX reported 50%, and Clang, 46%. On ARM, GCC reported 56% of the loops as having been vectorized, ACFL reported 54%, and Clang, 47%. We found that the vectorized code did not always outperform the unvectorized code. In some cases, given two very similar vectorizable loops, a compiler would…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques
