Delve into the Performance Degradation of Differentiable Architecture   Search

Jiuling Zhang; Zhiming Ding

arXiv:2109.13466·cs.LG·September 29, 2021

Delve into the Performance Degradation of Differentiable Architecture Search

Jiuling Zhang, Zhiming Ding

PDF

TL;DR

This paper investigates the causes of performance degradation in differentiable architecture search (DARTS) and proposes a simple learning rate swap method to effectively mitigate the issue, linking it to operation selection bias.

Contribution

It challenges the assumption that well-trained supernet weights are essential for DARTS performance and introduces a novel early-stage gradient training approach to improve results.

Findings

01

Swapping learning rates of weights and parameters mitigates degradation.

02

Degradation is linked to operation selection bias, not just overfitting.

03

Operation-magnitude-based selective stop improves DARTS performance.

Abstract

Differentiable architecture search (DARTS) is widely considered to be easy to overfit the validation set which leads to performance degradation. We first employ a series of exploratory experiments to verify that neither high-strength architecture parameters regularization nor warmup training scheme can effectively solve this problem. Based on the insights from the experiments, we conjecture that the performance of DARTS does not depend on the well-trained supernet weights and argue that the architecture parameters should be trained by the gradients which are obtained in the early stage rather than the final stage of training. This argument is then verified by exchanging the learning rate schemes of weights and parameters. Experimental results show that the simple swap of the learning rates can effectively solve the degradation and achieve competitive performance. Further empirical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDifferentiable Architecture Search