PUL: Pre-load in Software for Caches Wouldn't Always Play Along
Arthur Bernhardt, Sajjad Tamimi, Florian Stock, Andreas Koch, Ilia Petrov

TL;DR
This paper explores the limitations and potential of software-based prefetching in modern CPU architectures, especially in near-data processing, to improve system performance by better managing memory latencies.
Contribution
It investigates the effectiveness of software prefetching in post-Moore systems and proposes strategies to enhance compute utilization through compute/IO interleaving.
Findings
Software prefetching can significantly improve performance in near-data processing.
Compute/IO interleaving maximizes compute utilization in intelligent memory systems.
Software prefetching effectiveness increases with newer CPU architectures.
Abstract
Memory latencies and bandwidth are major factors, limiting system performance and scalability. Modern CPUs aim at hiding latencies by employing large caches, out-of-order execution, or complex hardware prefetchers. However, software-based prefetching exhibits higher efficiency, improving with newer CPU generations. In this paper we investigate software-based, post-Moore systems that offload operations to intelligent memories. We show that software-based prefetching has even higher potential in near-data processing settings by maximizing compute utilization through compute/IO interleaving.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Cloud Computing and Resource Management
