TL;DR
DORY is an automatic tool that efficiently deploys deep neural networks on low-cost IoT microcontrollers, optimizing memory use and execution speed for real-world edge applications.
Contribution
It introduces a constraint programming approach to optimize DNN deployment on MCUs with limited memory, generating efficient C code for end-to-end inference.
Findings
Up to 2.5x better MAC/cycle than proprietary solutions on GAP8
18.1x better performance than state-of-the-art on STM32-F746
Enables end-to-end inference of MobileNet-128 at 4.3 fps with low energy consumption
Abstract
The deployment of Deep Neural Networks (DNNs) on end-nodes at the extreme edge of the Internet-of-Things is a critical enabler to support pervasive Deep Learning-enhanced applications. Low-Cost MCU-based end-nodes have limited on-chip memory and often replace caches with scratchpads, to reduce area overheads and increase energy efficiency -- requiring explicit DMA-based memory transfers between different levels of the memory hierarchy. Mapping modern DNNs on these systems requires aggressive topology-dependent tiling and double-buffering. In this work, we propose DORY (Deployment Oriented to memoRY) - an automatic tool to deploy DNNs on low cost MCUs with typically less than 1MB of on-chip SRAM memory. DORY abstracts tiling as a Constraint Programming (CP) problem: it maximizes L1 memory utilization under the topological constraints imposed by each DNN layer. Then, it generates ANSI C…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
