A Hybrid Vectorized Merge Sort on ARM NEON
Jincheng Zhou, Jin Zhang, Xiang Zhang, Tiaojie Xiao, Di Ma, Chunye, Gong

TL;DR
This paper introduces NEON-MS, a hybrid vectorized merge sort optimized for ARM NEON, achieving significant speed improvements over standard and parallel sorting algorithms through register optimization and efficient merge network structures.
Contribution
It presents a novel hybrid vectorized merge sort tailored for ARM NEON, with optimized register usage and improved merge network structures for high efficiency.
Findings
NEON-MS is 3.8 times faster than std::sort.
NEON-MS is 2.1 times faster than boost::block_sort.
NEON-MS achieves a 1.25x speedup over the parallel version of boost::block_sort.
Abstract
Sorting algorithms are the most extensively researched topics in computer science and serve for numerous practical applications. Although various sorts have been proposed for efficiency, different architectures offer distinct flavors to the implementation of parallel sorting. In this paper, we propose a hybrid vectorized merge sort on ARM NEON, named NEON Merge Sort for short (NEON-MS). In detail, according to the granted register functions, we first identify the optimal register number to avoid the register-to-memory access, due to the write-back of intermediate outcomes. More importantly, following the generic merge sort framework that primarily uses sorting network for column sort and merging networks for three types of vectorized merge, we further improve their structures for high efficiency in an unified asymmetry way: 1) it makes the optimal sorting networks with few comparators…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
