TASO: Time and Space Optimization for Memory-Constrained DNN Inference
Yuan Wen, Andrew Anderson, Valentin Radu, Michael F.P. O'Boyle, David, Gregg

TL;DR
This paper introduces TASO, a method for optimizing CNN inference on memory-constrained devices by balancing execution time and memory usage through ILP-based primitive selection and workspace allocation, achieving significant speedups and memory reductions.
Contribution
It presents a novel ILP-based approach for ahead-of-time CNN optimization that balances latency and memory, applicable across platforms and neural architectures.
Findings
8x speedup over greedy primitive selection
2.2x reduction in memory requirement
15% inference time increase compared to inference-time-only optimization
Abstract
Convolutional neural networks (CNNs) are used in many embedded applications, from industrial robotics and automation systems to biometric identification on mobile devices. State-of-the-art classification is typically achieved by large networks, which are prohibitively expensive to run on mobile and embedded devices with tightly constrained memory and energy budgets. We propose an approach for ahead-of-time domain specific optimization of CNN models, based on an integer linear programming (ILP) for selecting primitive operations to implement convolutional layers. We optimize the trade-off between execution time and memory consumption by: 1) attempting to minimize execution time across the whole network by selecting data layouts and primitive operations to implement each layer; and 2) allocating an appropriate workspace that reflects the upper bound of memory footprint per layer. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
Methods1x1 Convolution · Softmax · *Communicated@Fast*How Do I Communicate to Expedia? · Bottleneck Residual Block · Batch Normalization · Average Pooling · Dropout · Dense Connections · Max Pooling · Global Average Pooling
