MIDAS: Mosaic Input-Specific Differentiable Architecture Search

Konstanty Subbotko

arXiv:2602.17700·cs.LG·February 23, 2026

MIDAS: Mosaic Input-Specific Differentiable Architecture Search

Konstanty Subbotko

PDF

Open Access 3 Reviews

TL;DR

MIDAS introduces input-specific, attention-based architecture parameters for neural architecture search, improving robustness and performance across multiple benchmarks by localizing architecture decisions and modeling node connectivity.

Contribution

It modernizes DARTS with dynamic, input-specific parameters and a topology-aware search space, enhancing robustness and search effectiveness.

Findings

01

Achieves 97.42% top-1 accuracy on CIFAR-10.

02

Finds globally optimal architectures in NAS-Bench-201.

03

Sets state-of-the-art results on CIFAR-10 in RDARTS.

Abstract

Differentiable Neural Architecture Search (NAS) provides efficient, gradient-based methods for automatically designing neural networks, yet its adoption remains limited in practice. We present MIDAS, a novel approach that modernizes DARTS by replacing static architecture parameters with dynamic, input-specific parameters computed via self-attention. To improve robustness, MIDAS (i) localizes the architecture selection by computing it separately for each spatial patch of the activation map, and (ii) introduces a parameter-free, topology-aware search space that models node connectivity and simplifies selecting the two incoming edges per node. We evaluate MIDAS on the DARTS, NAS-Bench-201, and RDARTS search spaces. In DARTS, it reaches 97.42% top-1 on CIFAR-10 and 83.38% on CIFAR-100. In NAS-Bench-201, it consistently finds globally optimal architectures. In RDARTS, it sets the state of…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

Paper is written pretty clear about their technical detail. They put three space and provide some ablation on their own method.

Weaknesses

Weaknesses: As using differentiable archtiecture search is try to minimize the search cost, this field still lacks certain thereotical support, why using a shared weights super net and differentiable target towards certain architecture selection process can work. Any research without clearly addressing direction seems to be meaningful only on practical point of view. In this regard, this paper presents a result that only on-par or even worse compared to some old baselines, like beta-DARTS in 2

Reviewer 02Rating 4Confidence 4

Strengths

- Interesting idea and novel use of the self-attention technique with a promising new direction. Replacing global architecture parameters with a learned mapping from input features to architecture mixing weights (via attention) is a concrete extension not widely explored in standard DNAS papers. - Methodological process follows best practices for fairness and reproducibility.

Weaknesses

- The contributions do not read well. There may be made more explicit and clearer. - Not clear how this scales to larger resolution datesets since the self-attention has quadradic scaling. - I would like to see more varying datasets used in the experiments for searching for networks. CIFAR100/CIFAR10 have the same properties more or less (e.g., content, resolution) - Other NAS methods make the problem stricter by incorporating additional constraints with regards to FLOPS/Memory/Parameters - Lim

Reviewer 03Rating 2Confidence 4

Strengths

1.The idea of conditioning architecture parameters on input features is interesting and could make NAS more adaptive. 2.The paper provides extensive experiments and ablations across several standard benchmarks.

Weaknesses

1.I do not see the necessity of partitioning the feature map and learning patch-level architecture parameters, since Eq. (14) simply averages them to obtain a global parameter. This seems mathematically equivalent to directly learning a global weight as in DARTS. The authors should clarify the actual benefit of this design. 2.The use of self-attention is questionable. The defined queries (from raw features) and keys (from operated features) do not form true self-attention, and the semantic meani

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Advanced Graph Neural Networks