Multi-Strided Access Patterns to Boost Hardware Prefetching
Miguel O. Blom, Kristian F. D. Rietveld, Rob V. van Nieuwpoort

TL;DR
This paper introduces multi-strided memory access patterns to enhance hardware prefetching, significantly boosting performance of memory-bound kernels across various architectures by improving cache utilization and bandwidth.
Contribution
It proposes a novel multi-strided access transformation that improves prefetcher efficiency and demonstrates substantial performance gains over existing methods on multiple micro-architectures.
Findings
Achieves up to 12.55x speedup over Polly
Improves cache hit ratios and memory bandwidth
Outperforms state-of-the-art libraries on multiple kernels
Abstract
Important memory-bound kernels, such as linear algebra, convolutions, and stencils, rely on SIMD instructions as well as optimizations targeting improved vectorized data traversal and data re-use to attain satisfactory performance. On on temporary CPU architectures, the hardware prefetcher is of key importance for efficient utilization of the memory hierarchy. In this paper, we demonstrate that transforming a memory access pattern consisting of a single stride to one that concurrently accesses multiple strides, can boost the utilization of the hardware prefetcher, and in turn improves the performance of memory-bound kernels significantly. Using a set of micro-benchmarks, we establish that accessing memory in a multi-strided manner enables more cache lines to be concurrently brought into the cache, resulting in improved cache hit ratios and higher effective memory bandwidth without the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Distributed systems and fault tolerance
