CaMDN: Enhancing Cache Efficiency for Multi-tenant DNNs on Integrated NPUs
Tianhao Cai, Liang Wang, Limin Xiao, Meng Han, Zeyu Wang, Lin Sun, Xiaojian Liao

TL;DR
CaMDN is a co-design architecture that improves cache efficiency for multi-tenant DNNs on integrated NPUs by reducing cache contention and optimizing cache utilization, leading to significant performance gains.
Contribution
It introduces a novel architecture-scheduling co-design with a lightweight cache partitioning and dynamic scheduling method for multi-tenant DNNs on NPUs.
Findings
Reduces memory access by 33.4% on average.
Achieves up to 2.56× speedup for co-located DNNs.
Improves cache utilization and performance in multi-tenant environments.
Abstract
With the rapid development of DNN applications, multi-tenant execution, where multiple DNNs are co-located on a single SoC, is becoming a prevailing trend. Although many methods are proposed in prior works to improve multi-tenant performance, the impact of shared cache is not well studied. This paper proposes CaMDN, an architecture-scheduling co-design to enhance cache efficiency for multi-tenant DNNs on integrated NPUs. Specifically, a lightweight architecture is proposed to support model-exclusive, NPU-controlled regions inside shared cache to eliminate unexpected cache contention. Moreover, a cache scheduling method is proposed to improve shared cache utilization. In particular, it includes a cache-aware mapping method for adaptability to the varying available cache capacity and a dynamic allocation algorithm to adjust the usage among co-located DNNs at runtime. Compared to prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Caching and Content Delivery · Brain Tumor Detection and Classification
