Data Sampling Affects the Complexity of Online SGD over Dependent Data

Shaocong Ma; Ziyi Chen; Yi Zhou; Kaiyi Ji; Yingbin Liang

arXiv:2204.00006·cs.LG·April 4, 2022

Data Sampling Affects the Complexity of Online SGD over Dependent Data

Shaocong Ma, Ziyi Chen, Yi Zhou, Kaiyi Ji, Yingbin Liang

PDF

Open Access

TL;DR

This paper investigates how different data sampling methods influence the efficiency of online SGD when dealing with highly dependent data, revealing that strategic subsampling and mini-batch techniques can significantly improve convergence.

Contribution

It provides a theoretical analysis of data sampling schemes' impact on online SGD's sample complexity under data dependence, introducing improved methods for dependent data scenarios.

Findings

01

Periodic subsampling improves convergence over standard SGD.

02

Subsampling a subset accelerates learning with dependent data.

03

Mini-batch sampling further enhances sample efficiency.

Abstract

Conventional machine learning applications typically assume that data samples are independently and identically distributed (i.i.d.). However, practical scenarios often involve a data-generating process that produces highly dependent data samples, which are known to heavily bias the stochastic optimization process and slow down the convergence of learning. In this paper, we conduct a fundamental study on how different stochastic data sampling schemes affect the sample complexity of online stochastic gradient descent (SGD) over highly dependent data. Specifically, with a $ϕ$ -mixing model of data dependence, we show that online SGD with proper periodic data-subsampling achieves an improved sample complexity over the standard online SGD in the full spectrum of the data dependence level. Interestingly, even subsampling a subset of data samples can accelerate the convergence of online SGD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Markov Chains and Monte Carlo Methods

MethodsStochastic Gradient Descent