Scalable Hierarchical Instruction Cache for Ultra-Low-Power Processors Clusters
Jie Chen, Igor Loi, Eric Flamand, Giuseppe Tagliavini, Luca Benini,, Davide Rossi

TL;DR
This paper introduces a scalable hierarchical instruction cache for ultra-low-power processor clusters, improving performance and scalability while maintaining energy efficiency, suitable for IoT end-nodes.
Contribution
It proposes a novel two-level instruction cache with a shared L1.5 cache and a prefetcher, enhancing scalability and performance in ultra-low-power clusters.
Findings
Up to 20% higher operating frequency.
Maximum performance improved by up to 17%.
Maintains similar energy efficiency for key applications.
Abstract
High Performance and Energy Efficiency are critical requirements for Internet of Things (IoT) end-nodes. Exploiting tightly-coupled clusters of programmable processors (CMPs) has recently emerged as a suitable solution to address this challenge. One of the main bottlenecks limiting the performance and energy efficiency of these systems is the instruction cache architecture due to its criticality in terms of timing (i.e., maximum operating frequency), bandwidth, and power. We propose a hierarchical instruction cache tailored to ultra-low-power tightly-coupled processor clusters where a relatively large cache (L1.5) is shared by L1 private caches through a two-cycle latency interconnect. To address the performance loss caused by the L1 capacity misses, we introduce a next-line prefetcher with cache probe filtering (CPF) from L1 to L1.5. We optimize the core instruction fetch (IF) stage by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
