Multi-objective Differentiable Neural Architecture Search
Rhea Sanjay Sukthanker, Arber Zela, Benedikt Staffler, Samuel Dooley,, Josif Grabocka, Frank Hutter

TL;DR
This paper introduces a novel neural architecture search method that efficiently profiles Pareto fronts across multiple devices and objectives using a hypernetwork, enabling zero-shot transfer and outperforming existing methods.
Contribution
The authors propose a hypernetwork-based NAS algorithm that encodes user preferences and hardware features, allowing single-run Pareto front profiling and zero-shot transfer to new devices.
Findings
Effective profiling of Pareto fronts across 19 devices.
Outperforms existing MOO NAS methods on various datasets.
Scalable to diverse search spaces and objectives.
Abstract
Pareto front profiling in multi-objective optimization (MOO), i.e., finding a diverse set of Pareto optimal solutions, is challenging, especially with expensive objectives that require training a neural network. Typically, in MOO for neural architecture search (NAS), we aim to balance performance and hardware metrics across devices. Prior NAS approaches simplify this task by incorporating hardware constraints into the objective function, but profiling the Pareto front necessitates a computationally expensive search for each constraint. In this work, we propose a novel NAS algorithm that encodes user preferences to trade-off performance and hardware metrics, yielding representative and diverse architectures across multiple devices in just a single search run. To this end, we parameterize the joint architectural distribution across devices and multiple objectives via a hypernetwork that…
Peer Reviews
Decision·ICLR 2025 Poster
1-The method is adaptable to a range of devices by conditioning the hypernetwork on device embeddings, making it highly versatile for deployment on diverse hardware. 2-MODNAS is tested on various hardware devices and tasks, including image classification, machine translation, and language modeling, showcasing its applicability across multiple domains.
1-Using Hypernetworks for NAS is well known but doesn’t seem a promising solution. It is like a heuristic solution. 2- I think energy and latency are not necessarily conflicting metrics.
- The paper proposes to execute one-shot NAS while search networks satisfy multiple objectives about accuracy and hardware efficiency with target hardware. - The paper analyzes the proposed method with various aspects such as efficacy, and robustness of the training process. - The paper provides extensive experiments and visualizations to support the proposal and its analysis.
- It may be hard to regulate the trade-off among user preferences with scalarization. Figure 4 can be a support, but it is just an abstract depiction, not experimental results. - The proposed method can help search optimized network architectures quickly at low cost. However, network architectures the same as or near ground truth solutions may hard to be reach with $\textbf{MetaHypernetwork}$, where other works can reach with huge search costs.
- The extension of existing zero-shot NAS techniques with Hypernetworks for HW aware NAS is motivated well and contextualized nicely for alredy existing techniques. - The authors provide an extensive evaluation for different applications (language modeling, vision, translation), multiple well known NAS searchspaces, and different target spaces (2-3 dimensional).
- The authors do not show how the proposed DNN architectures would actually perform on the different target systems. As far as I understand it, the HV results shown are calculated using the estimated results from the "MetaPredictors". While this still allows for a relative comparision with the other techniques and algorithms evaluated in the paper, it makes it hard to evaluate the actual usefulness and effectiveness of the apporach.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications
MethodsSparse Evolutionary Training · Pointwise Convolution · ReLU6 · Depthwise Convolution · Depthwise Separable Convolution · Batch Normalization · Global Average Pooling · 1x1 Convolution · Sigmoid Activation · Hard Swish
