Optimizing Scalable Multi-Cluster Architectures for Next-Generation Wireless Sensing and Communication
Samuel Riedel, Yichao Zhang, Marco Bertuletti, Luca Benini

TL;DR
This paper investigates optimal multi-cluster configurations for next-generation wireless systems, demonstrating that larger single clusters can outperform multiple smaller ones in speed and efficiency for data processing workloads.
Contribution
It provides an analysis of cluster size trade-offs, extends an open-source shared-memory cluster, and introduces a double-buffering barrier to improve performance.
Findings
A 256-core single cluster is twice as fast as 16 16-core clusters for memory-bound tasks.
Larger clusters reduce synchronization overhead, improving performance.
Proposed double-buffering barrier decouples processor and DMA, enhancing efficiency.
Abstract
Next-generation wireless technologies (for immersive-massive communication, joint communication and sensing) demand highly parallel architectures for massive data processing. A common architectural template scales up by grouping tens to hundreds of cores into shared-memory clusters, which are then scaled out as multi-cluster manycore systems. This hierarchical design, used in GPUs and accelerators, requires a balancing act between fewer large clusters and more smaller clusters, affecting design complexity, synchronization, communication efficiency, and programmability. While all multi-cluster architectures must balance these trade-offs, there is limited insight into optimal cluster sizes. This paper analyzes various cluster configurations, focusing on synchronization, data movement overhead, and programmability for typical wireless sensing and communication workloads. We extend the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
