CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity

Konpat Preechakul; Boonserm Kijsirikul

arXiv:1912.11493·cs.LG·December 30, 2019·1 cites

CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity

Konpat Preechakul, Boonserm Kijsirikul

PDF

Open Access 1 Repo

TL;DR

CProp is a gradient scaling method that adaptively adjusts the learning rate based on past gradient conformity, improving training speed for various optimizers.

Contribution

It introduces CProp, a novel gradient scaling technique that dynamically adapts learning rates based on gradient agreement, applicable to any optimizer.

Findings

01

Significant training speed improvements with CProp on SGD and Adam.

02

CProp effectively adjusts learning rates based on gradient conformity.

03

Codes are publicly available for reproducibility.

Abstract

Most optimizers including stochastic gradient descent (SGD) and its adaptive gradient derivatives face the same problem where an effective learning rate during the training is vastly different. A learning rate scheduling, mostly tuned by hand, is usually employed in practice. In this paper, we propose CProp, a gradient scaling method, which acts as a second-level learning rate adapting throughout the training process based on cues from past gradient conformity. When the past gradients agree on direction, CProp keeps the original learning rate. On the contrary, if the gradients do not agree on direction, CProp scales down the gradient proportionally to its uncertainty. Since it works by scaling, it could apply to any existing optimizer extending its learning rate scheduling capability. We put CProp to a series of tests showing significant gain in training speed on both SGD and adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

phizaz/cprop
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Adam · Stochastic Gradient Descent