Addressing memory bandwidth scalability in vector processors for streaming applications
Jordi Altayo, Paul Delestrac, David Novo, Simey Yang, Debjyoti Bhattacharjee, Francky Catthoor

TL;DR
This paper proposes a new extended memory hierarchy with multiple on-chip memory levels and data-shufflers to alleviate memory bandwidth bottlenecks in data-parallel AI/ML applications, improving scalability.
Contribution
It introduces a novel memory hierarchy architecture with three on-chip levels and data-shufflers, enhancing data reuse and bandwidth efficiency for AI/ML workloads.
Findings
Improved memory bandwidth utilization compared to GPUs and systolic array accelerators.
Enhanced data reuse in CNN applications with the proposed architecture.
Quantified performance benefits over existing accelerators.
Abstract
As the size of artificial intelligence and machine learning (AI/ML) models and datasets grows, the memory bandwidth becomes a critical bottleneck. The paper presents a novel extended memory hierarchy that addresses some major memory bandwidth challenges in data-parallel AI/ML applications. While data-parallel architectures like GPUs and neural network accelerators have improved power performance compared to traditional CPUs, they can still be significantly bottlenecked by their memory bandwidth, especially when the data reuse in the loop kernels is limited. Systolic arrays (SAs) and GPUs attempt to mitigate the memory bandwidth bottleneck but can still become memory bandwidth throttled when the amount of data reuse is not sufficient to confine data access mostly to the local memories near to the processing. To mitigate this, the proposed architecture introduces three levels of on-chip…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Big Data and Digital Economy
