CaMDN: Enhancing Cache Efficiency for Multi-tenant DNNs on Integrated NPUs

Tianhao Cai; Liang Wang; Limin Xiao; Meng Han; Zeyu Wang; Lin Sun; Xiaojian Liao

arXiv:2505.06625·cs.AR·May 15, 2025

CaMDN: Enhancing Cache Efficiency for Multi-tenant DNNs on Integrated NPUs

Tianhao Cai, Liang Wang, Limin Xiao, Meng Han, Zeyu Wang, Lin Sun, Xiaojian Liao

PDF

Open Access

TL;DR

CaMDN is a co-design architecture that improves cache efficiency for multi-tenant DNNs on integrated NPUs by reducing cache contention and optimizing cache utilization, leading to significant performance gains.

Contribution

It introduces a novel architecture-scheduling co-design with a lightweight cache partitioning and dynamic scheduling method for multi-tenant DNNs on NPUs.

Findings

01

Reduces memory access by 33.4% on average.

02

Achieves up to 2.56× speedup for co-located DNNs.

03

Improves cache utilization and performance in multi-tenant environments.

Abstract

With the rapid development of DNN applications, multi-tenant execution, where multiple DNNs are co-located on a single SoC, is becoming a prevailing trend. Although many methods are proposed in prior works to improve multi-tenant performance, the impact of shared cache is not well studied. This paper proposes CaMDN, an architecture-scheduling co-design to enhance cache efficiency for multi-tenant DNNs on integrated NPUs. Specifically, a lightweight architecture is proposed to support model-exclusive, NPU-controlled regions inside shared cache to eliminate unexpected cache contention. Moreover, a cache scheduling method is proposed to improve shared cache utilization. In particular, it includes a cache-aware mapping method for adaptability to the varying available cache capacity and a dynamic allocation algorithm to adjust the usage among co-located DNNs at runtime. Compared to prior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT and Edge/Fog Computing · Caching and Content Delivery · Brain Tumor Detection and Classification