Fine-Grained Vectorized Merge Sorting on RISC-V: From Register to Cache

Jin Zhang; Jincheng Zhou; Xiang Zhang; Di Ma; Chunye Gong

arXiv:2410.00455·cs.DC·October 2, 2024

Fine-Grained Vectorized Merge Sorting on RISC-V: From Register to Cache

Jin Zhang, Jincheng Zhou, Xiang Zhang, Di Ma, Chunye Gong

PDF

Open Access

TL;DR

This paper introduces a novel, fine-grained vectorized merge sort algorithm for RISC-V architectures, optimizing register operations and cache utilization to improve sorting efficiency.

Contribution

It develops a cache-aware, vectorized merge sort with innovative register transpose and merge schemes tailored for RISC-V's vector capabilities.

Findings

01

Achieves improved sorting performance on RISC-V hardware.

02

Introduces efficient register transpose using data proxy techniques.

03

Designs an asymmetric merging network for better cache utilization.

Abstract

Merge sort as a divide-sort-merge paradigm has been widely applied in computer science fields. As modern reduced instruction set computing architectures like the fifth generation (RISC-V) regard multiple registers as a vector register group for wide instruction parallelism, optimizing merge sort with this vectorized property is becoming increasingly common. In this paper, we overhaul the divide-sort-merge paradigm, from its register-level sort to the cache-aware merge, to develop a fine-grained RISC-V vectorized merge sort (RVMS). From the register-level view, the inline vectorized transpose instruction is missed in RISC-V, so implementing it efficiently is non-trivial. Besides, the vectorized comparisons do not always work well in the merging networks. Both issues primarily stem from the expensive data shuffle instruction. To bypass it, RVMS strides to take register data as the proxy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression