Nonparametric Stochastic Compositional Gradient Descent for Q-Learning   in Continuous Markov Decision Problems

Alec Koppel; Ekaterina Tolstaya; Ethan Stump; Alejandro Ribeiro

arXiv:1804.07323·cs.LG·April 23, 2018·5 cites

Nonparametric Stochastic Compositional Gradient Descent for Q-Learning in Continuous Markov Decision Problems

Alec Koppel, Ekaterina Tolstaya, Ethan Stump, Alejandro Ribeiro

PDF

Open Access 1 Repo

TL;DR

This paper introduces KQ-Learning, a nonparametric stochastic gradient method for continuous state-action Markov Decision Problems, combining kernel methods with orthogonal matching pursuit to efficiently learn policies with proven convergence.

Contribution

It develops a novel kernel-based stochastic quasi-gradient algorithm with complexity control for continuous MDPs, and proves its convergence to a stationary point.

Findings

01

Converges with probability 1 to a stationary point.

02

Achieves low Bellman error with constant learning rates.

03

Produces competitive policies on benchmark tasks.

Abstract

We consider Markov Decision Problems defined over continuous state and action spaces, where an autonomous agent seeks to learn a map from its states to actions so as to maximize its long-term discounted accumulation of rewards. We address this problem by considering Bellman's optimality equation defined over action-value functions, which we reformulate into a nested non-convex stochastic optimization problem defined over a Reproducing Kernel Hilbert Space (RKHS). We develop a functional generalization of stochastic quasi-gradient method to solve it, which, owing to the structure of the RKHS, admits a parameterization in terms of scalar weights and past state-action pairs which grows proportionately with the algorithm iteration index. To ameliorate this complexity explosion, we apply Kernel Orthogonal Matching Pursuit to the sequence of kernel weights and dictionaries, which yields a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

katetolstaya/kernelrl
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics · Markov Chains and Monte Carlo Methods