CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity
Konpat Preechakul, Boonserm Kijsirikul

TL;DR
CProp is a gradient scaling method that adaptively adjusts the learning rate based on past gradient conformity, improving training speed for various optimizers.
Contribution
It introduces CProp, a novel gradient scaling technique that dynamically adapts learning rates based on gradient agreement, applicable to any optimizer.
Findings
Significant training speed improvements with CProp on SGD and Adam.
CProp effectively adjusts learning rates based on gradient conformity.
Codes are publicly available for reproducibility.
Abstract
Most optimizers including stochastic gradient descent (SGD) and its adaptive gradient derivatives face the same problem where an effective learning rate during the training is vastly different. A learning rate scheduling, mostly tuned by hand, is usually employed in practice. In this paper, we propose CProp, a gradient scaling method, which acts as a second-level learning rate adapting throughout the training process based on cues from past gradient conformity. When the past gradients agree on direction, CProp keeps the original learning rate. On the contrary, if the gradients do not agree on direction, CProp scales down the gradient proportionally to its uncertainty. Since it works by scaling, it could apply to any existing optimizer extending its learning rate scheduling capability. We put CProp to a series of tests showing significant gain in training speed on both SGD and adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Adam · Stochastic Gradient Descent
