Optimizing GPU Cache Policies for MI Workloads
Johnathan Alsop, Matthew D. Sinclair, Srikant Bharadwaj, Alexandru, Dutu, Anthony Gutierrez, Onur Kayiran, Michael LeBeane, Sooraj Puthoor,, Xianwei Zhang, Tsung Tai Yeh, and Bradford M. Beckmann

TL;DR
This paper investigates GPU cache policies for machine intelligence workloads, revealing no universal solution and proposing optimized strategies that adapt to workload behaviors for improved performance.
Contribution
It characterizes MI workloads' cache behavior and introduces adaptive cache optimizations that outperform static policies.
Findings
No single GPU cache policy is optimal for all MI workloads.
Adaptive cache strategies can match or exceed the performance of static policies.
Workload-specific cache optimizations significantly improve GPU performance.
Abstract
In recent years, machine intelligence (MI) applications have emerged as a major driver for the computing industry. Optimizing these workloads is important but complicated. As memory demands grow and data movement overheads increasingly limit performance, determining the best GPU caching policy to use for a diverse range of MI workloads represents one important challenge. To study this, we evaluate 17 MI applications and characterize their behaviors using a range of GPU caching strategies. In our evaluations, we find that the choice of caching policy in GPU caches involves multiple performance trade-offs and interactions, and there is no one-size-fits-all GPU caching policy for MI workloads. Based on detailed simulation results, we motivate and evaluate a set of cache optimizations that consistently match the performance of the best static GPU caching policies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
