A Multicast-Capable AXI Crossbar for Many-core Machine Learning Accelerators

Luca Colagrande; Luca Benini

arXiv:2502.19215·cs.AR·November 11, 2025

A Multicast-Capable AXI Crossbar for Many-core Machine Learning Accelerators

Luca Colagrande, Luca Benini

PDF

TL;DR

This paper introduces a multicast-capable AXI crossbar designed for many-core machine learning accelerators, improving data movement efficiency and achieving significant performance gains with minimal overhead.

Contribution

It presents a lightweight, flexible multicast extension for AXI crossbars and demonstrates its effectiveness in a 288-core accelerator with notable performance improvements.

Findings

01

29% speedup in matrix multiplication performance

02

Modest 12% area and 6% timing overhead

03

Effective integration into large-scale accelerators

Abstract

To keep up with the growing computational requirements of machine learning workloads, many-core accelerators integrate an ever-increasing number of processing elements, putting the efficiency of memory and interconnect subsystems to the test. In this work, we present the design of a multicast-capable AXI crossbar, with the goal of enhancing data movement efficiency in massively parallel machine learning accelerators. We propose a lightweight, yet flexible, multicast implementation, with a modest area and timing overhead (12% and 6% respectively) even on the largest physically-implementable 16-to-16 AXI crossbar. To demonstrate the flexibility and end-to-end benefits of our design, we integrate our extension into an open-source 288-core accelerator. We report tangible performance improvements on a key computational kernel for machine learning workloads, matrix multiplication, measuring a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.