Balancing Accuracy and Latency in Multipath Neural Networks
Mohammed Amer, Tom\'as Maul, Iman Yi Liao

TL;DR
This paper introduces a neural architecture search method that efficiently balances accuracy and latency in multipath neural networks, enabling the design of resource-efficient models suitable for limited-resource devices.
Contribution
It combines one-shot architecture search with pruning to accurately model and predict the accuracy-latency trade-offs of diverse neural network architectures.
Findings
Accurately models the accuracy-latency relationship across models
Predicts performance of unseen models with high precision
Applicable across different datasets
Abstract
The growing capacity of neural networks has strongly contributed to their success at complex machine learning tasks and the computational demand of such large models has, in turn, stimulated a significant improvement in the hardware necessary to accelerate their computations. However, models with high latency aren't suitable for limited-resource environments such as hand-held and IoT devices. Hence, many deep learning techniques aim to address this problem by developing models with reasonable accuracy without violating the limited-resource constraint. In this work, we use a one-shot neural architecture search model to implicitly evaluate the performance of an intractable number of multipath neural networks. Combining this architecture search with a pruning technique and architecture sample evaluation, we can model the relation between the accuracy and the latency of a spectrum of models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
MethodsPruning
