Re-parameterizing Your Optimizers rather than Architectures
Xiaohan Ding, Honghao Chen, Xiangyu Zhang, Kaiqi Huang, Jungong Han,, Guiguang Ding

TL;DR
This paper introduces RepOptimizers, a method to incorporate model-specific prior knowledge into optimizers through gradient re-parameterization, enabling simple models like VGG to perform competitively with complex architectures.
Contribution
Proposes Gradient Re-parameterization to embed priors into optimizers, improving training efficiency and performance of simple models without extra computations.
Findings
RepOpt-VGG matches or exceeds performance of recent models.
RepOptimizers require no extra forward/backward computations.
RepOpt-VGG is efficient with high inference speed.
Abstract
The well-designed structures in neural networks reflect the prior knowledge incorporated into the models. However, though different models have various priors, we are used to training them with model-agnostic optimizers such as SGD. In this paper, we propose to incorporate model-specific prior knowledge into optimizers by modifying the gradients according to a set of model-specific hyper-parameters. Such a methodology is referred to as Gradient Re-parameterization, and the optimizers are named RepOptimizers. For the extreme simplicity of model structure, we focus on a VGG-style plain model and showcase that such a simple model trained with a RepOptimizer, which is referred to as RepOpt-VGG, performs on par with or better than the recent well-designed models. From a practical perspective, RepOpt-VGG is a favorable base model because of its simple structure, high inference speed and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Stochastic Gradient Descent · Balanced Selection
