An Empirical Study of Intel Xeon Phi
Jianbin Fang, Ana Lucia Varbanescu, Henk Sips, Lilun Zhang, Yonggang, Che, Chuanfu Xu

TL;DR
This paper provides an empirical analysis of the Intel Xeon Phi architecture, measuring its performance limits and factors affecting efficiency to aid programmers in optimizing applications for this many-core system.
Contribution
It offers a detailed microbenchmarking of Xeon Phi's hardware components and develops optimization guidelines to simplify its programming model.
Findings
Performance close to theoretical peak in ideal conditions
Identified key causes of performance penalties
Developed simplified programming guidelines
Abstract
With at least 50 cores, Intel Xeon Phi is a true many-core architecture. Featuring fairly powerful cores, two cache levels, and very fast interconnections, the Xeon Phi can get a theoretical peak of 1000 GFLOPs and over 240 GB/s. These numbers, as well as its flexibility - it can be used both as a coprocessor or as a stand-alone processor - are very tempting for parallel applications looking for new performance records. In this paper, we present an empirical study of Xeon Phi, stressing its performance limits and relevant performance factors, ultimately aiming to present a simplified view of the machine for regular programmers in search for performance. To do so, we have micro-benchmarked the main hardware components of the processor - the cores, the memory hierarchies, the ring interconnect, and the PCIe connection. We show that, in ideal microbenchmarking conditions, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Interconnection Networks and Systems · Embedded Systems Design Techniques
