Coherence Traffic in Manycore Processors with Opaque Distributed   Directories

Steve Kommrusch; Marcos Horro; Louis-No\"el Pouchet; Gabriel; Rodr\'iguez; Juan Touri\~no

arXiv:2011.05422·cs.DC·November 12, 2020

Coherence Traffic in Manycore Processors with Opaque Distributed Directories

Steve Kommrusch, Marcos Horro, Louis-No\"el Pouchet, Gabriel, Rodr\'iguez, Juan Touri\~no

PDF

Open Access

TL;DR

This paper analyzes the coherence traffic in manycore processors with distributed directories, revealing the pseudo-random mapping of memory blocks and exploring optimizations to reduce latency, which improve throughput but not overall performance.

Contribution

It uncovers the pseudo-random memory mapping in Intel Knights Landing processors and evaluates optimizations to reduce coherence traffic and memory latency.

Findings

01

Optimizations reduce coherence traffic and improve memory throughput.

02

Memory latency improvements do not translate into overall performance gains.

03

The memory block mapping function is pseudo-random and complex.

Abstract

Manycore processors feature a high number of general-purpose cores designed to work in a multithreaded fashion. Recent manycore processors are kept coherent using scalable distributed directories. A paramount example is the Intel Mesh interconnect, which consists of a network-on-chip interconnecting "tiles", each of which contains computation cores, local caches, and coherence masters. The distributed coherence subsystem must be queried for every out-of-tile access, imposing an overhead on memory latency. This paper studies the physical layout of an Intel Knights Landing processor, with a particular focus on the coherence subsystem, and uncovers the pseudo-random mapping function of physical memory blocks across the pieces of the distributed directory. Leveraging this knowledge, candidate optimizations to improve memory latency through the minimization of coherence traffic are studied.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInterconnection Networks and Systems · Parallel Computing and Optimization Techniques · Advanced Memory and Neural Computing