YONO: Modeling Multiple Heterogeneous Neural Networks on Microcontrollers
Young D. Kwon, Jagmohan Chauhan, and Cecilia Mascolo

TL;DR
YONO is a novel approach that compresses and enables multi-task learning on microcontrollers by efficiently storing and switching between heterogeneous neural network models with minimal accuracy loss.
Contribution
YONO introduces a PQ-based compression method and an online execution framework for multi-task neural networks on MCUs, addressing resource constraints and enabling diverse IoT applications.
Findings
Achieves up to 12.37× compression with negligible accuracy loss.
Reduces model switching latency and energy consumption by over 93%.
Demonstrates generalizability across various architectures and datasets.
Abstract
With the advancement of Deep Neural Networks (DNN) and large amounts of sensor data from Internet of Things (IoT) systems, the research community has worked to reduce the computational and resource demands of DNN to compute on low-resourced microcontrollers (MCUs). However, most of the current work in embedded deep learning focuses on solving a single task efficiently, while the multi-tasking nature and applications of IoT devices demand systems that can handle a diverse range of tasks (activity, voice, and context recognition) with input from a variety of sensors, simultaneously. In this paper, we propose YONO, a product quantization (PQ) based approach that compresses multiple heterogeneous models and enables in-memory model execution and switching for dissimilar multi-task learning on MCUs. We first adopt PQ to learn codebooks that store weights of different models. Also, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
