Sufficient Exploration for Convex Q-learning

Fan Lu; Prashant Mehta; Sean Meyn; Gergely Neu

arXiv:2210.09409·math.OC·October 19, 2022

Sufficient Exploration for Convex Q-learning

Fan Lu, Prashant Mehta, Sean Meyn, Gergely Neu

PDF

Open Access

TL;DR

This paper introduces convex Q-learning, a dual approach to logistic Q-learning, demonstrating its effectiveness and addressing numerical challenges through regularization and state-dependent sampling, especially in cases where standard Q-learning fails.

Contribution

It establishes the structure of convex Q-learning's dual, provides conditions for bounded solutions, and demonstrates its success in diverging cases like LQR.

Findings

01

Convex Q-learning can succeed where standard Q-learning diverges.

02

Regularization is necessary to prevent over-fitting in convex Q-learning.

03

State-dependent sampling mitigates numerical challenges in continuous-time models.

Abstract

In recent years there has been a collective research effort to find new formulations of reinforcement learning that are simultaneously more efficient and more amenable to analysis. This paper concerns one approach that builds on the linear programming (LP) formulation of optimal control of Manne. A primal version is called logistic Q-learning, and a dual variant is convex Q-learning. This paper focuses on the latter, while building bridges with the former. The main contributions follow: (i) The dual of convex Q-learning is not precisely Manne's LP or a version of logistic Q-learning, but has similar structure that reveals the need for regularization to avoid over-fitting. (ii) A sufficient condition is obtained for a bounded solution to the Q-learning LP. (iii) Simulation studies reveal numerical challenges when addressing sampled-data systems based on a continuous time model. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Control Systems and Identification · Adaptive Dynamic Programming Control

MethodsQ-Learning