Torrent: A Distributed DMA for Efficient and Flexible Point-to-Multipoint Data Movement
Yunhao Deng, Fanchen Kong, Xiaoling Yi, Ryan Antonio, Marian Verhelst

TL;DR
Torrent introduces a distributed DMA architecture that efficiently supports point-to-multipoint data transfers in SoCs without modifying existing NoC hardware, significantly improving performance and scalability for data-parallel workloads.
Contribution
It presents a novel Chainwrite mechanism for P2MP data movement that avoids hardware modifications and optimizes transfer chains with new scheduling algorithms.
Findings
Achieves up to 7.88x speedup over unicast baseline.
Minimal area (1.2%) and power (2.3%) overheads in ASIC implementation.
Demonstrates scalability and flexibility over network-layer multicast.
Abstract
The growing disparity between computational power and on-chip communication bandwidth is a critical bottleneck in modern Systems-on-Chip (SoCs), especially for data-parallel workloads like AI. Efficient point-to-multipoint (P2MP) data movement, such as multicast, is essential for high performance. However, native multicast support is lacking in standard interconnect protocols. Existing P2MP solutions, such as multicast-capable Network-on-Chip (NoC), impose additional overhead to the network hardware and require modifications to the interconnect protocol, compromising scalability and compatibility. This paper introduces Torrent, a novel distributed DMA architecture that enables efficient P2MP data transfers without modifying NoC hardware and interconnect protocol. Torrent conducts P2MP data transfers by forming logical chains over the NoC, where the data traverses through targeted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterconnection Networks and Systems · Parallel Computing and Optimization Techniques · Distributed systems and fault tolerance
