A Dynamic Allocation Scheme for Adaptive Shared-Memory Mapping on Kilo-core RV Clusters for Attention-Based Model Deployment

Bowen Wang; Marco Bertuletti; Yichao Zhang; Victor J.B. Jung; Luca Benini

arXiv:2508.01180·cs.AR·August 5, 2025

A Dynamic Allocation Scheme for Adaptive Shared-Memory Mapping on Kilo-core RV Clusters for Attention-Based Model Deployment

Bowen Wang, Marco Bertuletti, Yichao Zhang, Victor J.B. Jung, Luca Benini

PDF

Open Access

TL;DR

This paper introduces a runtime programmable address remapping hardware, DAS, that improves data locality and performance in large-scale shared-memory clusters for attention-based models, achieving significant speedups with minimal area overhead.

Contribution

The paper presents DAS, a novel dynamic allocation scheme with hardware support for address remapping, enhancing data locality and throughput in large RISC-V clusters for ML workloads.

Findings

01

Achieves 1.94x speedup on ViT-L/16 model

02

Reduces data access contention in shared L1 memory

03

Incurs less than 0.1% area overhead in 12nm technology

Abstract

Attention-based models demand flexible hardware to manage diverse kernels with varying arithmetic intensities and memory access patterns. Large clusters with shared L1 memory, a common architectural pattern, struggle to fully utilize their processing elements (PEs) when scaled up due to reduced throughput in the hierarchical PE-to-L1 intra-cluster interconnect. This paper presents Dynamic Allocation Scheme (DAS), a runtime programmable address remapping hardware unit coupled with a unified memory allocator, designed to minimize data access contention of PEs onto the multi-banked L1. We evaluated DAS on an aggressively scaled-up 1024-PE RISC-V cluster with Non-Uniform Memory Access (NUMA) PE-to-L1 interconnect to demonstrate its potential for improving data locality in large parallel machine learning workloads. For a Vision Transformer (ViT)-L/16 model, each encoder layer executes in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Advanced Neural Network Applications