Cache-aware Performance Modeling and Prediction for Dense Linear Algebra

Elmar Peise (1); Paolo Bientinesi (1) ((1) AICES; RWTH Aachen)

arXiv:1409.8602·cs.PF·October 1, 2014·1 cites

Cache-aware Performance Modeling and Prediction for Dense Linear Algebra

Elmar Peise (1), Paolo Bientinesi (1) ((1) AICES, RWTH Aachen)

PDF

Open Access

TL;DR

This paper introduces a cache-aware performance modeling approach for dense linear algebra routines that predicts the best implementation and tuning parameters without executing the algorithms, significantly aiding optimization.

Contribution

It presents a novel methodology for predicting optimal dense linear algebra performance by modeling kernels and tracking cache contents, eliminating the need for runtime execution.

Findings

01

Performance predictions are within a few percent of the optimal results.

02

The methodology effectively identifies the best algorithm among alternatives.

03

It enables efficient tuning of linear algebra routines without actual execution.

Abstract

Countless applications cast their computational core in terms of dense linear algebra operations. These operations can usually be implemented by combining the routines offered by standard linear algebra libraries such as BLAS and LAPACK, and typically each operation can be obtained in many alternative ways. Interestingly, identifying the fastest implementation -- without executing it -- is a challenging task even for experts. An equally challenging task is that of tuning each routine to performance-optimal configurations. Indeed, the problem is so difficult that even the default values provided by the libraries are often considerably suboptimal; as a solution, normally one has to resort to executing and timing the routines, driven by some form of parameter search. In this paper, we discuss a methodology to solve both problems: identifying the best performing algorithm within a family of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Interconnection Networks and Systems