A Lightweight High-Throughput Collective-Capable NoC for Large-Scale ML Accelerators

Luca Colagrande; Lorenzo Leone; Chen Wu; Tim Fischer; Raphael Roth; Luca Benini

arXiv:2603.26438·cs.AR·May 13, 2026

A Lightweight High-Throughput Collective-Capable NoC for Large-Scale ML Accelerators

Luca Colagrande, Lorenzo Leone, Chen Wu, Tim Fischer, Raphael Roth, Luca Benini

PDF

TL;DR

This paper introduces a lightweight, high-throughput NoC with collective capabilities, enabling efficient communication and computation for large-scale ML accelerators, significantly improving speed and energy efficiency.

Contribution

It presents a novel NoC design with Direct Compute Access (DCA) for high-bandwidth in-network reductions, supporting scalable ML workloads with minimal area overhead.

Findings

01

Achieves 5.3x speedup in multicast operations

02

Achieves 2.8x speedup in reduction operations

03

Scales efficiently to large mesh architectures with significant performance gains

Abstract

The exponential increase in Machine Learning (ML) model size and complexity has driven unprecedented demand for high-performance acceleration systems. As technology scaling enables the integration of thousands of computing elements onto a single die, the boundary between distributed and on-chip systems has blurred, making efficient on-chip collective communication increasingly critical. In this work, we present a lightweight, collective-capable Network on Chip (NoC) that supports efficient barrier synchronization alongside scalable, high-bandwidth multicast and reduction operations, co-designed for the next generation of ML accelerators. We introduce Direct Compute Access (DCA), a novel paradigm that grants the interconnect fabric direct access to the cores' computational resources, enabling high-throughput in-network reductions with a small 16.9% router area overhead. Through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.