Stochastic Optimization with Importance Sampling
Peilin Zhao, Tong Zhang

TL;DR
This paper explores importance sampling in stochastic optimization algorithms like prox-SGD and prox-SDCA, demonstrating that it reduces variance and improves convergence rates through theoretical analysis and experiments.
Contribution
It introduces importance sampling schemes for prox-SGD and prox-SDCA, significantly enhancing their convergence performance over uniform sampling.
Findings
Importance sampling reduces variance in stochastic gradients.
Theoretical convergence rates are improved with importance sampling.
Experimental results verify the theoretical benefits of the proposed methods.
Abstract
Uniform sampling of training data has been commonly used in traditional stochastic optimization algorithms such as Proximal Stochastic Gradient Descent (prox-SGD) and Proximal Stochastic Dual Coordinate Ascent (prox-SDCA). Although uniform sampling can guarantee that the sampled stochastic quantity is an unbiased estimate of the corresponding true quantity, the resulting estimator may have a rather high variance, which negatively affects the convergence of the underlying optimization procedure. In this paper we study stochastic optimization with importance sampling, which improves the convergence rate by reducing the stochastic variance. Specifically, we study prox-SGD (actually, stochastic mirror descent) with importance sampling and prox-SDCA with importance sampling. For prox-SGD, instead of adopting uniform sampling throughout the training process, the proposed algorithm employs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques
