Autotuning Benchmarking Techniques: A Roofline Model Case Study
Jacob Odg{\aa}rd T{\o}rring, Jan Christian Meyer, Anne C. Elster

TL;DR
This paper introduces an autotuning tool for benchmarking the DGEMM operation using the Roofline model, significantly reducing search time while maintaining accuracy, and demonstrating its effectiveness across various hardware architectures.
Contribution
The paper presents a novel autotuning approach that efficiently finds optimal benchmarking configurations for hardware using confidence intervals and early stopping, improving search speed by up to 116x.
Findings
Achieves up to 116.33x faster benchmarking search time.
Maintains less than 2% error compared to hand-tuned parameters.
Effective across multiple hardware architectures.
Abstract
Peak performance metrics published by vendors often do not correspond to what can be achieved in practice. It is therefore of great interest to do extensive benchmarking on core applications and library routines. Since DGEMM is one of the most used in compute-intensive numerical codes, it is typically highly vendor optimized and of great interest for empirical benchmarks. In this paper we show how to build a novel tool that autotunes the benchmarking process for the Roofline model. Our novel approach can efficiently and reliably find optimal configurations for any target hardware. Results of our tool on a range of hardware architectures and comparisons to theoretical peak performance are included. Our tool autotunes the benchmarks for the target architecture by deciding the optimal parameters through state space reductions and exhaustive search. Our core idea includes calculating the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Numerical Methods and Algorithms · Low-power high-performance VLSI design
