Multi-objective Differentiable Neural Architecture Search

Rhea Sanjay Sukthanker; Arber Zela; Benedikt Staffler; Samuel Dooley,; Josif Grabocka; Frank Hutter

arXiv:2402.18213·cs.LG·February 6, 2025·2 cites

Multi-objective Differentiable Neural Architecture Search

Rhea Sanjay Sukthanker, Arber Zela, Benedikt Staffler, Samuel Dooley,, Josif Grabocka, Frank Hutter

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper introduces a novel neural architecture search method that efficiently profiles Pareto fronts across multiple devices and objectives using a hypernetwork, enabling zero-shot transfer and outperforming existing methods.

Contribution

The authors propose a hypernetwork-based NAS algorithm that encodes user preferences and hardware features, allowing single-run Pareto front profiling and zero-shot transfer to new devices.

Findings

01

Effective profiling of Pareto fronts across 19 devices.

02

Outperforms existing MOO NAS methods on various datasets.

03

Scalable to diverse search spaces and objectives.

Abstract

Pareto front profiling in multi-objective optimization (MOO), i.e., finding a diverse set of Pareto optimal solutions, is challenging, especially with expensive objectives that require training a neural network. Typically, in MOO for neural architecture search (NAS), we aim to balance performance and hardware metrics across devices. Prior NAS approaches simplify this task by incorporating hardware constraints into the objective function, but profiling the Pareto front necessitates a computationally expensive search for each constraint. In this work, we propose a novel NAS algorithm that encodes user preferences to trade-off performance and hardware metrics, yielding representative and diverse architectures across multiple devices in just a single search run. To this end, we parameterize the joint architectural distribution across devices and multiple objectives via a hypernetwork that…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

1-The method is adaptable to a range of devices by conditioning the hypernetwork on device embeddings, making it highly versatile for deployment on diverse hardware. 2-MODNAS is tested on various hardware devices and tasks, including image classification, machine translation, and language modeling, showcasing its applicability across multiple domains.

Weaknesses

1-Using Hypernetworks for NAS is well known but doesn’t seem a promising solution. It is like a heuristic solution. 2- I think energy and latency are not necessarily conflicting metrics.

Reviewer 02Rating 6Confidence 4

Strengths

- The paper proposes to execute one-shot NAS while search networks satisfy multiple objectives about accuracy and hardware efficiency with target hardware. - The paper analyzes the proposed method with various aspects such as efficacy, and robustness of the training process. - The paper provides extensive experiments and visualizations to support the proposal and its analysis.

Weaknesses

- It may be hard to regulate the trade-off among user preferences with scalarization. Figure 4 can be a support, but it is just an abstract depiction, not experimental results. - The proposed method can help search optimized network architectures quickly at low cost. However, network architectures the same as or near ground truth solutions may hard to be reach with $\textbf{MetaHypernetwork}$, where other works can reach with huge search costs.

Reviewer 03Rating 6Confidence 4

Strengths

- The extension of existing zero-shot NAS techniques with Hypernetworks for HW aware NAS is motivated well and contextualized nicely for alredy existing techniques. - The authors provide an extensive evaluation for different applications (language modeling, vision, translation), multiple well known NAS searchspaces, and different target spaces (2-3 dimensional).

Weaknesses

- The authors do not show how the proposed DNN architectures would actually perform on the different target systems. As far as I understand it, the HV results shown are calculated using the estimated results from the "MetaPredictors". While this still allows for a relative comparision with the other techniques and algorithms evaluated in the paper, it makes it hard to evaluate the actual usefulness and effectiveness of the apporach.

Code & Models

Repositories

automl/modnas
pytorchOfficial

Videos

Multi-objective Differentiable Neural Architecture Search· slideslive

Taxonomy

TopicsNeural Networks and Applications

MethodsSparse Evolutionary Training · Pointwise Convolution · ReLU6 · Depthwise Convolution · Depthwise Separable Convolution · Batch Normalization · Global Average Pooling · 1x1 Convolution · Sigmoid Activation · Hard Swish