An Empirical Study of Intel Xeon Phi

Jianbin Fang; Ana Lucia Varbanescu; Henk Sips; Lilun Zhang; Yonggang; Che; Chuanfu Xu

arXiv:1310.5842·cs.DC·December 23, 2013·37 cites

An Empirical Study of Intel Xeon Phi

Jianbin Fang, Ana Lucia Varbanescu, Henk Sips, Lilun Zhang, Yonggang, Che, Chuanfu Xu

PDF

Open Access

TL;DR

This paper provides an empirical analysis of the Intel Xeon Phi architecture, measuring its performance limits and factors affecting efficiency to aid programmers in optimizing applications for this many-core system.

Contribution

It offers a detailed microbenchmarking of Xeon Phi's hardware components and develops optimization guidelines to simplify its programming model.

Findings

01

Performance close to theoretical peak in ideal conditions

02

Identified key causes of performance penalties

03

Developed simplified programming guidelines

Abstract

With at least 50 cores, Intel Xeon Phi is a true many-core architecture. Featuring fairly powerful cores, two cache levels, and very fast interconnections, the Xeon Phi can get a theoretical peak of 1000 GFLOPs and over 240 GB/s. These numbers, as well as its flexibility - it can be used both as a coprocessor or as a stand-alone processor - are very tempting for parallel applications looking for new performance records. In this paper, we present an empirical study of Xeon Phi, stressing its performance limits and relevant performance factors, ultimately aiming to present a simplified view of the machine for regular programmers in search for performance. To do so, we have micro-benchmarked the main hardware components of the processor - the cores, the memory hierarchies, the ring interconnect, and the PCIe connection. We show that, in ideal microbenchmarking conditions, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Interconnection Networks and Systems · Embedded Systems Design Techniques