Nonparametric Stochastic Compositional Gradient Descent for Q-Learning in Continuous Markov Decision Problems
Alec Koppel, Ekaterina Tolstaya, Ethan Stump, Alejandro Ribeiro

TL;DR
This paper introduces KQ-Learning, a nonparametric stochastic gradient method for continuous state-action Markov Decision Problems, combining kernel methods with orthogonal matching pursuit to efficiently learn policies with proven convergence.
Contribution
It develops a novel kernel-based stochastic quasi-gradient algorithm with complexity control for continuous MDPs, and proves its convergence to a stationary point.
Findings
Converges with probability 1 to a stationary point.
Achieves low Bellman error with constant learning rates.
Produces competitive policies on benchmark tasks.
Abstract
We consider Markov Decision Problems defined over continuous state and action spaces, where an autonomous agent seeks to learn a map from its states to actions so as to maximize its long-term discounted accumulation of rewards. We address this problem by considering Bellman's optimality equation defined over action-value functions, which we reformulate into a nested non-convex stochastic optimization problem defined over a Reproducing Kernel Hilbert Space (RKHS). We develop a functional generalization of stochastic quasi-gradient method to solve it, which, owing to the structure of the RKHS, admits a parameterization in terms of scalar weights and past state-action pairs which grows proportionately with the algorithm iteration index. To ameliorate this complexity explosion, we apply Kernel Orthogonal Matching Pursuit to the sequence of kernel weights and dictionaries, which yields a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics · Markov Chains and Monte Carlo Methods
