Delve into the Performance Degradation of Differentiable Architecture Search
Jiuling Zhang, Zhiming Ding

TL;DR
This paper investigates the causes of performance degradation in differentiable architecture search (DARTS) and proposes a simple learning rate swap method to effectively mitigate the issue, linking it to operation selection bias.
Contribution
It challenges the assumption that well-trained supernet weights are essential for DARTS performance and introduces a novel early-stage gradient training approach to improve results.
Findings
Swapping learning rates of weights and parameters mitigates degradation.
Degradation is linked to operation selection bias, not just overfitting.
Operation-magnitude-based selective stop improves DARTS performance.
Abstract
Differentiable architecture search (DARTS) is widely considered to be easy to overfit the validation set which leads to performance degradation. We first employ a series of exploratory experiments to verify that neither high-strength architecture parameters regularization nor warmup training scheme can effectively solve this problem. Based on the insights from the experiments, we conjecture that the performance of DARTS does not depend on the well-trained supernet weights and argue that the architecture parameters should be trained by the gradients which are obtained in the early stage rather than the final stage of training. This argument is then verified by exchanging the learning rate schemes of weights and parameters. Experimental results show that the simple swap of the learning rates can effectively solve the degradation and achieve competitive performance. Further empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDifferentiable Architecture Search
