Heterogeneous Dual-Core Overlay Processor for Light-Weight CNNs
Tiandong Zhao, Yunxuan Yu, Kun Wang, Lei He

TL;DR
This paper introduces a heterogeneous dual-core processor architecture optimized for light-weight CNNs, significantly improving runtime efficiency and throughput by leveraging specialized cores and a novel scheduling algorithm.
Contribution
It proposes the first in-depth heterogeneous dual-core architecture for light-weight CNNs, optimizing cores for different layer types and developing a scheduling algorithm for enhanced parallelism.
Findings
11% improvement in runtime PE efficiency
31% increase in throughput for single network
11% throughput gain for multiple networks
Abstract
Light-weight convolutional neural networks (CNNs) have small complexity and are good candidates for low-power, high-throughput inference. Such networks are heterogeneous in terms of computation-to-communication (CTC) ratios and computation patterns between layers, especially for different layer types. Yet, existing AI processors either use homogeneous processing elements (PEs), resulting in low runtime PE efficiency, or run different layers on heterogeneous PEs in sequential, introducing resource redundancy. This paper proposes a heterogeneous dual-core architecture (dual-OPU), where one core is optimized for regular convolution layers and the other for depthwise convolution layers. PEs are homogeneous with each core. To make full use of dual-core parallelism, we develop a scheduling algorithm to concurrently execute layers for different input images on dual-core and balance parallel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Advanced Memory and Neural Computing
