A fast vectorized sorting implementation based on the ARM scalable   vector extension (SVE)

B\'erenger Bramas

arXiv:2105.07782·cs.DC·November 22, 2021

A fast vectorized sorting implementation based on the ARM scalable vector extension (SVE)

B\'erenger Bramas

PDF

Open Access

TL;DR

This paper presents a fast, vectorized sorting implementation optimized for ARM's SVE, demonstrating significant speedups over standard algorithms by adapting to SVE's unique features.

Contribution

The paper introduces a novel vectorized sorting method tailored for ARM SVE, addressing its unique predicate and variable vector size, and achieves substantial performance improvements.

Findings

01

Achieves 4x speedup over GNU C++ sort

02

Efficiently handles different data types including integers and doubles

03

Adapts well to ARM SVE's predicate and variable vector size

Abstract

The way developers implement their algorithms and how these implementations behave on modern CPUs are governed by the design and organization of these. The vectorization units (SIMD) are among the few CPUs' parts that can and must be explicitly controlled. In the HPC community, the x86 CPUs and their vectorization instruction sets were de-facto the standard for decades. Each new release of an instruction set was usually a doubling of the vector length coupled with new operations. Each generation was pushing for adapting and improving previous implementations. The release of the ARM scalable vector extension (SVE) changed things radically for several reasons. First, we expect ARM processors to equip many supercomputers in the next years. Second, SVE's interface is different in several aspects from the x86 extensions as it provides different instructions, uses a predicate to control most…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Advanced Data Storage Technologies