Accelerating Distributed-Memory Autotuning via Statistical Analysis of Execution Paths
Edward Hutter, Edgar Solomonik

TL;DR
This paper presents Critter, a framework for approximate autotuning in distributed-memory systems that uses statistical analysis of execution paths to predict kernel performance, significantly speeding up computations with high accuracy.
Contribution
The paper introduces Critter, a novel profiling tool that automates kernel execution decisions using statistical profiles, enabling efficient autotuning at scale.
Findings
Achieves up to 7.1x speed-up in distributed-memory linear algebra algorithms.
Provides 98% accuracy in performance prediction.
Effectively reduces kernel execution overhead through statistical profiling.
Abstract
The prohibitive expense of automatic performance tuning at scale has largely limited the use of autotuning to libraries for shared-memory and GPU architectures. We introduce a framework for approximate autotuning that achieves a desired confidence in each algorithm configuration's performance by constructing confidence intervals to describe the performance of individual kernels (subroutines of benchmarked programs). Once a kernel's performance is deemed sufficiently predictable for a set of inputs, subsequent invocations are avoided and replaced with a predictive model of the execution time. We then leverage online execution path analysis to coordinate selective kernel execution and propagate each kernel's statistical profile. This strategy is effective in the presence of frequently-recurring computation and communication kernels, which is characteristic to algorithms in numerical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
