Latency-Aware Neural Architecture Search with Multi-Objective Bayesian Optimization
David Eriksson, Pierce I-Jen Chuang, Samuel Daulton, Peng Xia, Akshat, Shrivastava, Arun Babu, Shicong Zhao, Ahmed Aly, Ganesh Venkatesh, Maximilian, Balandat

TL;DR
This paper introduces a latency-aware neural architecture search method using multi-objective Bayesian optimization to efficiently balance model accuracy and on-device latency for large-scale natural language models.
Contribution
It applies advanced Bayesian optimization techniques to optimize neural architectures considering multiple objectives, specifically latency and accuracy, in a production environment.
Findings
Effective trade-off exploration between latency and accuracy.
Improved neural architecture search efficiency.
Demonstrated results on Facebook's natural language models.
Abstract
When tuning the architecture and hyperparameters of large machine learning models for on-device deployment, it is desirable to understand the optimal trade-offs between on-device latency and model accuracy. In this work, we leverage recent methodological advances in Bayesian optimization over high-dimensional search spaces and multi-objective Bayesian optimization to efficiently explore these trade-offs for a production-scale on-device natural language understanding model at Facebook.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Bandit Algorithms Research · Advanced Neural Network Applications
