Inter-thread Communication in Multithreaded, Reconfigurable Coarse-grain   Arrays

Dani Voitsechov; Yoav Etsion

arXiv:1801.05178·cs.AR·January 17, 2018

Inter-thread Communication in Multithreaded, Reconfigurable Coarse-grain Arrays

Dani Voitsechov, Yoav Etsion

PDF

TL;DR

This paper proposes a novel direct inter-thread communication method in multithreaded CGRAs, significantly improving performance and power efficiency over traditional GPGPU memory-based communication.

Contribution

It introduces a new communication model, hardware primitives, and system extensions enabling direct thread-to-thread data exchange in CGRAs.

Findings

01

Average speedup of 4.5x over GPGPU

02

Power reduction of 7x on average

03

Elimination of barriers and scratchpad memory

Abstract

Traditional von Neumann GPGPUs only allow threads to communicate through memory on a group-to-group basis. In this model, a group of producer threads writes intermediate values to memory, which are read by a group of consumer threads after a barrier synchronization. To alleviate the memory bandwidth imposed by this method of communication, GPGPUs provide a small scratchpad memory that prevents intermediate values from overloading DRAM bandwidth. In this paper we introduce direct inter-thread communications for massively multithreaded CGRAs, where intermediate values are communicated directly through the compute fabric on a point-to-point basis. This method avoids the need to write values to memory, eliminates the need for a dedicated scratchpad, and avoids workgroup-global barriers. The paper introduces the programming model (CUDA) and execution model extensions, as well as the hardware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.