# PA&DA: Jointly Sampling PAth and DAta for Consistent NAS

**Authors:** Shun Lu, Yu Hu, Longxing Yang, Zihao Sun, Jilin Mei, Jianchao Tan,, Chengru Song

arXiv: 2302.14772 · 2023-03-01

## TL;DR

This paper introduces PA&DA, a method that jointly optimizes sampling distributions of paths and data during supernet training to reduce gradient variance, improve ranking consistency, and enhance NAS performance.

## Contribution

It proposes a novel importance sampling strategy based on normalized gradient norms for paths and data, leading to more reliable supernet training and better architecture search results.

## Key findings

- Lower gradient variance during supernet training.
- Improved ranking consistency of supernet evaluations.
- Higher accuracy of searched architectures.

## Abstract

Based on the weight-sharing mechanism, one-shot NAS methods train a supernet and then inherit the pre-trained weights to evaluate sub-models, largely reducing the search cost. However, several works have pointed out that the shared weights suffer from different gradient descent directions during training. And we further find that large gradient variance occurs during supernet training, which degrades the supernet ranking consistency. To mitigate this issue, we propose to explicitly minimize the gradient variance of the supernet training by jointly optimizing the sampling distributions of PAth and DAta (PA&DA). We theoretically derive the relationship between the gradient variance and the sampling distributions, and reveal that the optimal sampling probability is proportional to the normalized gradient norm of path and training data. Hence, we use the normalized gradient norm as the importance indicator for path and training data, and adopt an importance sampling strategy for the supernet training. Our method only requires negligible computation cost for optimizing the sampling distributions of path and data, but achieves lower gradient variance during supernet training and better generalization performance for the supernet, resulting in a more consistent NAS. We conduct comprehensive comparisons with other improved approaches in various search spaces. Results show that our method surpasses others with more reliable ranking performance and higher accuracy of searched architectures, showing the effectiveness of our method. Code is available at https://github.com/ShunLu91/PA-DA.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.14772/full.md

## Figures

23 figures with captions in the complete paper: https://tomesphere.com/paper/2302.14772/full.md

## References

59 references — full list in the complete paper: https://tomesphere.com/paper/2302.14772/full.md

---
Source: https://tomesphere.com/paper/2302.14772