Stochastic Gradient Variance Reduction by Solving a Filtering Problem

Xingyi Yang

arXiv:2012.12418·cs.LG·May 18, 2021·1 cites

Stochastic Gradient Variance Reduction by Solving a Filtering Problem

Xingyi Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Filter Gradient Descent (FGD), a novel stochastic optimization method that reduces gradient variance by solving an adaptive filtering problem, leading to more reliable gradient estimates and faster neural network training.

Contribution

The paper presents the first practical integration of filtering techniques into gradient estimation for stochastic optimization, improving convergence and robustness.

Findings

01

FGD outperforms traditional methods in neural network training.

02

The method effectively reduces gradient variance in numerical optimization.

03

FGD accelerates convergence compared to momentum-based algorithms.

Abstract

Deep neural networks (DNN) are typically optimized using stochastic gradient descent (SGD). However, the estimation of the gradient using stochastic samples tends to be noisy and unreliable, resulting in large gradient variance and bad convergence. In this paper, we propose \textbf{Filter Gradient Decent}~(FGD), an efficient stochastic optimization algorithm that makes the consistent estimation of the local gradient by solving an adaptive filtering problem with different design of filters. Our method reduces variance in stochastic gradient descent by incorporating the historical states to enhance the current estimation. It is able to correct noisy gradient direction as well as to accelerate the convergence of learning. We demonstrate the effectiveness of the proposed Filter Gradient Descent on numerical optimization and training neural networks, where it achieves superior and robust…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Adamdad/Filter-Gradient-Decent
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Machine Learning and ELM