Stochastic Gradient Variance Reduction by Solving a Filtering Problem
Xingyi Yang

TL;DR
This paper introduces Filter Gradient Descent (FGD), a novel stochastic optimization method that reduces gradient variance by solving an adaptive filtering problem, leading to more reliable gradient estimates and faster neural network training.
Contribution
The paper presents the first practical integration of filtering techniques into gradient estimation for stochastic optimization, improving convergence and robustness.
Findings
FGD outperforms traditional methods in neural network training.
The method effectively reduces gradient variance in numerical optimization.
FGD accelerates convergence compared to momentum-based algorithms.
Abstract
Deep neural networks (DNN) are typically optimized using stochastic gradient descent (SGD). However, the estimation of the gradient using stochastic samples tends to be noisy and unreliable, resulting in large gradient variance and bad convergence. In this paper, we propose \textbf{Filter Gradient Decent}~(FGD), an efficient stochastic optimization algorithm that makes the consistent estimation of the local gradient by solving an adaptive filtering problem with different design of filters. Our method reduces variance in stochastic gradient descent by incorporating the historical states to enhance the current estimation. It is able to correct noisy gradient direction as well as to accelerate the convergence of learning. We demonstrate the effectiveness of the proposed Filter Gradient Descent on numerical optimization and training neural networks, where it achieves superior and robust…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Machine Learning and ELM
