Performance analysis of the Kahan-enhanced scalar product on current   multicore processors

Johannes Hofmann; Dietmar Fey; Jan Eitzinger; Georg Hager; Gerhard; Wellein

arXiv:1505.02586·cs.PF·February 19, 2019

Performance analysis of the Kahan-enhanced scalar product on current multicore processors

Johannes Hofmann, Dietmar Fey, Jan Eitzinger, Georg Hager, Gerhard, Wellein

PDF

TL;DR

This paper analyzes the performance of a Kahan-enhanced scalar product on modern Intel multicore processors, demonstrating near-native speed with proper low-level optimizations and providing insights into architectural impacts.

Contribution

It provides a detailed performance analysis and optimized SIMD implementations of the Kahan scalar product across multiple Intel processor generations.

Findings

01

Kahan-enhanced scalar product achieves nearly the same performance as naive implementation with optimizations.

02

Performance bottlenecks are identified using instruction analysis and ECM model.

03

Architectural changes significantly affect performance and saturation behavior.

Abstract

We investigate the performance characteristics of a numerically enhanced scalar product (dot) kernel loop that uses the Kahan algorithm to compensate for numerical errors, and describe efficient SIMD-vectorized implementations on recent Intel processors. Using low-level instruction analysis and the execution-cache-memory (ECM) performance model we pinpoint the relevant performance bottlenecks for single-core and thread-parallel execution, and predict performance and saturation behavior. We show that the Kahan-enhanced scalar product comes at almost no additional cost compared to the naive (non-Kahan) scalar product if appropriate low-level optimizations, notably SIMD vectorization and unrolling, are applied. We also investigate the impact of architectural changes across four generations of Intel Xeon processors.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.