Fine-Grained Vectorized Merge Sorting on RISC-V: From Register to Cache
Jin Zhang, Jincheng Zhou, Xiang Zhang, Di Ma, Chunye Gong

TL;DR
This paper introduces a novel, fine-grained vectorized merge sort algorithm for RISC-V architectures, optimizing register operations and cache utilization to improve sorting efficiency.
Contribution
It develops a cache-aware, vectorized merge sort with innovative register transpose and merge schemes tailored for RISC-V's vector capabilities.
Findings
Achieves improved sorting performance on RISC-V hardware.
Introduces efficient register transpose using data proxy techniques.
Designs an asymmetric merging network for better cache utilization.
Abstract
Merge sort as a divide-sort-merge paradigm has been widely applied in computer science fields. As modern reduced instruction set computing architectures like the fifth generation (RISC-V) regard multiple registers as a vector register group for wide instruction parallelism, optimizing merge sort with this vectorized property is becoming increasingly common. In this paper, we overhaul the divide-sort-merge paradigm, from its register-level sort to the cache-aware merge, to develop a fine-grained RISC-V vectorized merge sort (RVMS). From the register-level view, the inline vectorized transpose instruction is missed in RISC-V, so implementing it efficiently is non-trivial. Besides, the vectorized comparisons do not always work well in the merging networks. Both issues primarily stem from the expensive data shuffle instruction. To bypass it, RVMS strides to take register data as the proxy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression
