TeraPool: A Physical Design Aware, 1024 RISC-V Cores Shared-L1-Memory Scaled-up Cluster Design with High Bandwidth Main Memory Link
Yichao Zhang, Marco Bertuletti, Chi Zhang, Samuel Riedel, Diyou Shen, Bowen Wang, Alessandro Vanelli-Coralli, Luca Benini

TL;DR
TeraPool is a scalable, high-performance RISC-V cluster design with over 1000 cores, sharing a large L1 memory and achieving high bandwidth and energy efficiency, suitable for massively parallel computing.
Contribution
This paper introduces TeraPool, a physically implementable design for a large-scale RISC-V cluster with shared L1 memory and high bandwidth main memory interface.
Findings
Achieves 910MHz frequency in 12nm FinFET technology.
Delivers up to 1.89 TFLOP/s peak performance.
Consumes 9-13.5pJ per memory access, with high energy efficiency.
Abstract
Shared L1-memory clusters of streamlined instruction processors (processing elements - PEs) are commonly used as building blocks in modern, massively parallel computing architectures (e.g. GP-GPUs). Scaling out these architectures by increasing the number of clusters incurs computational and power overhead, caused by the requirement to split and merge large data structures in chunks and move chunks across memory hierarchies via the high-latency global interconnect. Scaling up the cluster reduces buffering, copy, and synchronization overheads. However, the complexity of a fully connected cores-to-L1-memory crossbar grows quadratically with PE-count, posing a major physical implementation challenge. We present TeraPool, a physically implementable, >1000 floating-point-capable RISC-V PEs scaled-up cluster design, sharing a Multi-MegaByte >4000-banked L1 memory via a low latency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Interconnection Networks and Systems · Low-power high-performance VLSI design
