MemPool: A Shared-L1 Memory Many-Core Cluster with a Low-Latency   Interconnect

Matheus Cavalcante; Samuel Riedel; Antonio Pullini; Luca Benini

arXiv:2012.02973·cs.AR·July 21, 2022

MemPool: A Shared-L1 Memory Many-Core Cluster with a Low-Latency Interconnect

Matheus Cavalcante, Samuel Riedel, Antonio Pullini, Luca Benini

PDF

TL;DR

MemPool is a scalable shared-L1 many-core system with low-latency interconnects, enabling efficient memory access and improved performance for signal processing workloads.

Contribution

This work introduces MemPool, a 32-bit many-core architecture with a novel low-latency interconnect and memory addressing scheme, demonstrating scalable shared-L1 memory for over 16 cores.

Findings

01

Average memory access latency under load is fewer than 6 cycles.

02

Memory bank addressing scheme improves performance by up to 20%.

03

Design achieves performance comparable to an ideal full-crossbar system.

Abstract

A key challenge in scaling shared-L1 multi-core clusters towards many-core (more than 16 cores) configurations is to ensure low-latency and efficient access to the L1 memory. In this work we demonstrate that it is possible to scale up the shared-L1 architecture: We present MemPool, a 32 bit many-core system with 256 fast RV32IMA "Snitch" cores featuring application-tunable execution units, running at 700 MHz in typical conditions (TT/0.80 V/25{\deg}C). MemPool is easy to program, with all the cores sharing a global view of a large L1 scratchpad memory pool, accessible within at most 5 cycles. In MemPool's physical-aware design, we emphasized the exploration, design, and optimization of the low-latency processor-to-L1-memory interconnect. We compare three candidate topologies, analyzing them in terms of latency, throughput, and back-end feasibility. The chosen topology keeps the average…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.