Mitigating the Bandwidth Wall via Data-Streaming System-Accelerator Co-Design
Qunyou Liu, Marina Zapater, David Atienza

TL;DR
This paper presents a co-designed system and accelerator for transformer inference that significantly improves throughput by optimizing data streaming and compute-transfer overlap, addressing bandwidth limitations in hardware acceleration.
Contribution
It introduces MatrixFlow, a systolic-array accelerator with a novel page-aligned matrix multiplication, and Gem5-AcceSys, a simulation platform for exploring system integration, jointly optimizing transformer inference performance.
Findings
Up to 22x speedup over CPU baseline
5x to 8x improvements over existing accelerators
Achieves 80% of on-device HBM performance with PCIe host memory
Abstract
Transformers have revolutionized AI in natural language processing and computer vision, but their large computation and memory demands pose major challenges for hardware acceleration. In practice, end-to-end throughput is often limited by paged data movement and interconnect bandwidth rather than raw MAC count. This work proposes a unified system-accelerator co-design approach for transformer inference that jointly optimizes a matrix accelerator and its system integration through paged streaming dataflows and explicit overlap of compute and transfer. On the hardware side, we introduce MatrixFlow, a loosely coupled 16x16 systolic-array accelerator with a page-aligned block matrix multiplication method using 4 KB tiles, a small on-chip buffer of about 20 KB, and a pipelined schedule of DMA, compute, and DMA-out to utilize interconnect bandwidth efficiently. On the system side, we develop…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Advanced Neural Network Applications
