Importance of Explicit Vectorization for CPU and GPU Software   Performance

Neil G. Dickson; Kamran Karimi; Firas Hamze

arXiv:1004.0024·cs.DC·May 18, 2015

Importance of Explicit Vectorization for CPU and GPU Software Performance

Neil G. Dickson, Kamran Karimi, Firas Hamze

PDF

TL;DR

This paper demonstrates that explicit vectorization and memory coalescing are crucial for optimizing CPU and GPU performance, achieving significant speedups in a Monte Carlo algorithm.

Contribution

The study highlights the importance of explicit vectorization and memory coalescing for high-performance computing on CPU and GPU, providing detailed optimization insights.

Findings

01

CPU vectorization yields 9x-12x speedup

02

GPU optimization achieves half the CPU speedup

03

Explicit optimization techniques are critical for performance

Abstract

Much of the current focus in high-performance computing is on multi-threading, multi-computing, and graphics processing unit (GPU) computing. However, vectorization and non-parallel optimization techniques, which can often be employed additionally, are less frequently discussed. In this paper, we present an analysis of several optimizations done on both central processing unit (CPU) and GPU implementations of a particular computationally intensive Metropolis Monte Carlo algorithm. Explicit vectorization on the CPU and the equivalent, explicit memory coalescing, on the GPU are found to be critical to achieving good performance of this algorithm in both environments. The fully-optimized CPU version achieves a 9x to 12x speedup over the original CPU version, in addition to speedup from multi-threading. This is 2x faster than the fully-optimized GPU version.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.