Performance analysis of the Kahan-enhanced scalar product on current   multi- and manycore processors

Johannes Hofmann; Dietmar Fey; Michael Riedmann; Jan Eitzinger; Georg; Hager; Gerhard Wellein

arXiv:1604.01890·cs.PF·July 9, 2018

Performance analysis of the Kahan-enhanced scalar product on current multi- and manycore processors

Johannes Hofmann, Dietmar Fey, Michael Riedmann, Jan Eitzinger, Georg, Hager, Gerhard Wellein

PDF

TL;DR

This paper analyzes the performance of a Kahan-enhanced scalar product on modern multi- and manycore processors, showing it can be nearly as efficient as naive implementations with proper low-level optimizations.

Contribution

It provides a detailed performance analysis and SIMD-optimized implementation of the Kahan scalar product across multiple architectures, extending the ECM model.

Findings

01

Kahan-enhanced scalar product has minimal overhead with optimizations

02

Performance bottlenecks identified through instruction analysis

03

Extended ECM model predicts performance across architectures

Abstract

We investigate the performance characteristics of a numerically enhanced scalar product (dot) kernel loop that uses the Kahan algorithm to compensate for numerical errors, and describe efficient SIMD-vectorized implementations on recent multi- and manycore processors. Using low-level instruction analysis and the execution-cache-memory (ECM) performance model we pinpoint the relevant performance bottlenecks for single-core and thread-parallel execution, and predict performance and saturation behavior. We show that the Kahan-enhanced scalar product comes at almost no additional cost compared to the naive (non-Kahan) scalar product if appropriate low-level optimizations, notably SIMD vectorization and unrolling, are applied. The ECM model is extended appropriately to accommodate not only modern Intel multicore chips but also the Intel Xeon Phi "Knights Corner" coprocessor and an IBM POWER8…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.