FractalSortCPU: Bandwidth-Efficient Compressed Radix Sort on CPU

Michael Dang'ana

arXiv:2605.10390·cs.DC·May 13, 2026

FractalSortCPU: Bandwidth-Efficient Compressed Radix Sort on CPU

Michael Dang'ana

PDF

TL;DR

FractalSortCPU introduces a bandwidth-efficient, in-place radix sort algorithm optimized for CPUs that outperforms existing methods by reducing data pre-processing and leveraging SIMD acceleration.

Contribution

The paper presents a novel CPU-adapted histogram compression scheme for radix sorting that improves performance and bandwidth efficiency on large-scale datasets.

Findings

01

Achieves up to 6x bandwidth efficiency improvement over state-of-the-art.

02

Reduces latency by eliminating input bucketing and data pre-processing.

03

Demonstrates superior performance on CPU, GPU, and FPGA across various data sizes.

Abstract

Cloud database systems, particularly their middleware and query execution layers, use sorting as a core operation in query processing, indexing and join execution. Distribution-dependence and limited parallelism are key issues inherent in state-of-the-art radix sort which is preferred for large datasets due to performance advantages over comparison-based algorithms. Multi-pass bucketing, stochastic sampling and dependence graph structures are common solutions to these problems that incur the cost of data pre-processing and increased memory footprint hence they are less appropriate for large-scale workloads common in cloud environments. In-place radix sort schemes increase the number of passes as precision increases, which negatively impacts latency. Our work solves these problems by introducing a CPU-adapted histogram compression scheme for radix sorting for arbitrary-precision keys…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.