Comparison of Vectorization Capabilities of Different Compilers for X86   and ARM CPUs

Nazmus Sakib; Tarun Prabhu; Nandakishore Santhi; John Shalf,; Abdel-Hameed A. Badawy

arXiv:2502.11906·cs.PF·February 21, 2025

Comparison of Vectorization Capabilities of Different Compilers for X86 and ARM CPUs

Nazmus Sakib, Tarun Prabhu, Nandakishore Santhi, John Shalf,, Abdel-Hameed A. Badawy

PDF

Open Access

TL;DR

This study compares the vectorization capabilities of various compilers on x86 and ARM CPUs using a modified real-world code suite, revealing inconsistent performance and no clear winner among compilers.

Contribution

It provides a comparative analysis of compiler vectorization effectiveness on x86 and ARM architectures using a realistic benchmark suite.

Findings

01

GCC vectorized 54% (x86) and 56% (ARM) of loops

02

ICX and ACFL reported similar vectorization rates (~50-54%)

03

No single compiler consistently outperformed others across platforms

Abstract

Most modern processors contain vector units that simultaneously perform the same arithmetic operation over multiple sets of operands. The ability of compilers to automatically vectorize code is critical to effectively using these units. Understanding this capability is important for anyone writing compute-intensive, high-performance, and portable code. We tested the ability of several compilers to vectorize code on x86 and ARM. We used the TSVC2 suite, with modifications that made it more representative of real-world code. On x86, GCC reported 54% of the loops in the suite as having been vectorized, ICX reported 50%, and Clang, 46%. On ARM, GCC reported 56% of the loops as having been vectorized, ACFL reported 54%, and Clang, 47%. We found that the vectorized code did not always outperform the unvectorized code. In some cases, given two very similar vectorizable loops, a compiler would…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques