Adapting to Mixing Time in Stochastic Optimization with Markovian Data
Ron Dorfman, Kfir Y. Levy

TL;DR
This paper introduces a novel stochastic optimization method that adapts to unknown mixing times in Markovian data, achieving optimal convergence without prior knowledge of the chain's mixing properties.
Contribution
The paper presents the first optimization algorithm that does not require mixing time knowledge and still attains optimal convergence rates in convex and non-convex Markovian data settings.
Findings
Achieves optimal asymptotic convergence rate in convex problems.
Extends to finding stationary points in non-convex optimization.
Improves dependence on mixing time in TD learning.
Abstract
We consider stochastic optimization problems where data is drawn from a Markov chain. Existing methods for this setting crucially rely on knowing the mixing time of the chain, which in real-world applications is usually unknown. We propose the first optimization method that does not require the knowledge of the mixing time, yet obtains the optimal asymptotic convergence rate when applied to convex problems. We further show that our approach can be extended to: (i) finding stationary points in non-convex optimization with Markovian data, and (ii) obtaining better dependence on the mixing time in temporal difference (TD) learning; in both cases, our method is completely oblivious to the mixing time. Our method relies on a novel combination of multi-level Monte Carlo (MLMC) gradient estimation together with an adaptive learning method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Statistical Methods and Inference · Bayesian Methods and Mixture Models
