MIDAS: Mosaic Input-Specific Differentiable Architecture Search
Konstanty Subbotko

TL;DR
MIDAS introduces input-specific, attention-based architecture parameters for neural architecture search, improving robustness and performance across multiple benchmarks by localizing architecture decisions and modeling node connectivity.
Contribution
It modernizes DARTS with dynamic, input-specific parameters and a topology-aware search space, enhancing robustness and search effectiveness.
Findings
Achieves 97.42% top-1 accuracy on CIFAR-10.
Finds globally optimal architectures in NAS-Bench-201.
Sets state-of-the-art results on CIFAR-10 in RDARTS.
Abstract
Differentiable Neural Architecture Search (NAS) provides efficient, gradient-based methods for automatically designing neural networks, yet its adoption remains limited in practice. We present MIDAS, a novel approach that modernizes DARTS by replacing static architecture parameters with dynamic, input-specific parameters computed via self-attention. To improve robustness, MIDAS (i) localizes the architecture selection by computing it separately for each spatial patch of the activation map, and (ii) introduces a parameter-free, topology-aware search space that models node connectivity and simplifies selecting the two incoming edges per node. We evaluate MIDAS on the DARTS, NAS-Bench-201, and RDARTS search spaces. In DARTS, it reaches 97.42% top-1 on CIFAR-10 and 83.38% on CIFAR-100. In NAS-Bench-201, it consistently finds globally optimal architectures. In RDARTS, it sets the state of…
Peer Reviews
Decision·Submitted to ICLR 2026
Paper is written pretty clear about their technical detail. They put three space and provide some ablation on their own method.
Weaknesses: As using differentiable archtiecture search is try to minimize the search cost, this field still lacks certain thereotical support, why using a shared weights super net and differentiable target towards certain architecture selection process can work. Any research without clearly addressing direction seems to be meaningful only on practical point of view. In this regard, this paper presents a result that only on-par or even worse compared to some old baselines, like beta-DARTS in 2
- Interesting idea and novel use of the self-attention technique with a promising new direction. Replacing global architecture parameters with a learned mapping from input features to architecture mixing weights (via attention) is a concrete extension not widely explored in standard DNAS papers. - Methodological process follows best practices for fairness and reproducibility.
- The contributions do not read well. There may be made more explicit and clearer. - Not clear how this scales to larger resolution datesets since the self-attention has quadradic scaling. - I would like to see more varying datasets used in the experiments for searching for networks. CIFAR100/CIFAR10 have the same properties more or less (e.g., content, resolution) - Other NAS methods make the problem stricter by incorporating additional constraints with regards to FLOPS/Memory/Parameters - Lim
1.The idea of conditioning architecture parameters on input features is interesting and could make NAS more adaptive. 2.The paper provides extensive experiments and ablations across several standard benchmarks.
1.I do not see the necessity of partitioning the feature map and learning patch-level architecture parameters, since Eq. (14) simply averages them to obtain a global parameter. This seems mathematically equivalent to directly learning a global weight as in DARTS. The authors should clarify the actual benefit of this design. 2.The use of self-attention is questionable. The defined queries (from raw features) and keys (from operated features) do not form true self-attention, and the semantic meani
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Advanced Graph Neural Networks
