Practically efficient methods for performing bit-reversed permutation in C++11 on the x86-64 architecture
Christian Knauth, Boran Adas, Daniel Whitfield, Xuesong Wang, Lydia, Ickler, Tim Conrad, Oliver Serang

TL;DR
This paper introduces new efficient C++11 methods for the bit-reversed permutation, compares them with existing approaches, and demonstrates the advantages of cache-oblivious and parallelizable strategies on modern hardware.
Contribution
It proposes three novel C++11 algorithms for bit-reversal, including a cache-oblivious recursive method, and evaluates their performance against existing techniques.
Findings
Cache-oblivious method is competitive with the fastest known approach.
New methods benefit from parallelization on multiple cores and GPU.
Theoretical and empirical analysis shows improved performance and scalability.
Abstract
The bit-reversed permutation is a famous task in signal processing and is key to efficient implementation of the fast Fourier transform. This paper presents optimized C++11 implementations of five extant methods for computing the bit-reversed permutation: Stockham auto-sort, naive bitwise swapping, swapping via a table of reversed bytes, local pairwise swapping of bits, and swapping via a cache-localized matrix buffer. Three new strategies for performing the bit-reversed permutation in C++11 are proposed: an inductive method using the bitwise XOR operation, a template-recursive closed form, and a cache-oblivious template-recursive approach, which reduces the bit-reversed permutation to smaller bit-reversed permutations and a square matrix transposition. These new methods are compared to the extant approaches in terms of theoretical runtime, empirical compile time, and empirical runtime.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Algorithms and Data Compression
