MIME: Adapting a Single Neural Network for Multi-task Inference with   Memory-efficient Dynamic Pruning

Abhiroop Bhattacharjee; Yeshwanth Venkatesha; Abhishek Moitra; and; Priyadarshini Panda

arXiv:2204.05274·cs.LG·June 22, 2022

MIME: Adapting a Single Neural Network for Multi-task Inference with Memory-efficient Dynamic Pruning

Abhiroop Bhattacharjee, Yeshwanth Venkatesha, Abhishek Moitra, and, Priyadarshini Panda

PDF

TL;DR

MIME is a co-designed algorithm-hardware approach that enables memory-efficient, energy-saving multi-task inference by reusing weights and applying dynamic pruning on a single neural network.

Contribution

MIME introduces a novel method for multi-task inference that reuses parent task weights and learns task-specific thresholds, improving memory and energy efficiency.

Findings

01

Achieves ~3.48x memory efficiency over conventional methods.

02

Provides ~2.4-3.1x energy savings on benchmark datasets.

03

Enables input-dependent dynamic neuronal pruning for energy-efficient inference.

Abstract

Recent years have seen a paradigm shift towards multi-task learning. This calls for memory and energy-efficient solutions for inference in a multi-task scenario. We propose an algorithm-hardware co-design approach called MIME. MIME reuses the weight parameters of a trained parent task and learns task-specific threshold parameters for inference on multiple child tasks. We find that MIME results in highly memory-efficient DRAM storage of neural-network parameters for multiple tasks compared to conventional multi-task inference. In addition, MIME results in input-dependent dynamic neuronal pruning, thereby enabling energy-efficient inference with higher throughput on a systolic-array hardware. Our experiments with benchmark datasets (child tasks)- CIFAR10, CIFAR100, and Fashion-MNIST, show that MIME achieves ~3.48x memory-efficiency and ~2.4-3.1x energy-savings compared to conventional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.