TASO: Time and Space Optimization for Memory-Constrained DNN Inference

Yuan Wen; Andrew Anderson; Valentin Radu; Michael F.P. O'Boyle; David; Gregg

arXiv:2005.10709·cs.LG·May 22, 2020·1 cites

TASO: Time and Space Optimization for Memory-Constrained DNN Inference

Yuan Wen, Andrew Anderson, Valentin Radu, Michael F.P. O'Boyle, David, Gregg

PDF

Open Access

TL;DR

This paper introduces TASO, a method for optimizing CNN inference on memory-constrained devices by balancing execution time and memory usage through ILP-based primitive selection and workspace allocation, achieving significant speedups and memory reductions.

Contribution

It presents a novel ILP-based approach for ahead-of-time CNN optimization that balances latency and memory, applicable across platforms and neural architectures.

Findings

01

8x speedup over greedy primitive selection

02

2.2x reduction in memory requirement

03

15% inference time increase compared to inference-time-only optimization

Abstract

Convolutional neural networks (CNNs) are used in many embedded applications, from industrial robotics and automation systems to biometric identification on mobile devices. State-of-the-art classification is typically achieved by large networks, which are prohibitively expensive to run on mobile and embedded devices with tightly constrained memory and energy budgets. We propose an approach for ahead-of-time domain specific optimization of CNN models, based on an integer linear programming (ILP) for selecting primitive operations to implement convolutional layers. We optimize the trade-off between execution time and memory consumption by: 1) attempting to minimize execution time across the whole network by selecting data layouts and primitive operations to implement each layer; and 2) allocating an appropriate workspace that reflects the upper bound of memory footprint per layer. These…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition

Methods1x1 Convolution · Softmax · *Communicated@Fast*How Do I Communicate to Expedia? · Bottleneck Residual Block · Batch Normalization · Average Pooling · Dropout · Dense Connections · Max Pooling · Global Average Pooling