Introducing a Performance Model for Bandwidth-Limited Loop Kernels

Jan Treibig; Georg Hager

arXiv:0905.0792·cs.PF·May 7, 2009·21 cites

Introducing a Performance Model for Bandwidth-Limited Loop Kernels

Jan Treibig, Georg Hager

PDF

Open Access

TL;DR

This paper introduces a performance model for bandwidth-limited loop kernels based on cache microarchitecture analysis, enabling accurate performance prediction and understanding of memory hierarchy impacts.

Contribution

The paper presents a novel performance model specifically designed for bandwidth-limited loop kernels, validated on modern x86 architectures.

Findings

01

Accurate performance prediction for memory operations

02

Insights into memory hierarchy performance contributions

03

Model validated on multiple modern architectures

Abstract

We present a performance model for bandwidth limited loop kernels which is founded on the analysis of modern cache based microarchitectures. This model allows an accurate performance prediction and evaluation for existing instruction codes. It provides an in-depth understanding of how performance for different memory hierarchy levels is made up. The performance of raw memory load, store and copy operations and a stream vector triad are analyzed and benchmarked on three modern x86-type quad-core architectures in order to demonstrate the capabilities of the model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Cloud Computing and Resource Management