Convex Q Learning in a Stochastic Environment: Extended Version

Fan Lu; Sean Meyn

arXiv:2309.05105·math.OC·September 12, 2023·2 cites

Convex Q Learning in a Stochastic Environment: Extended Version

Fan Lu, Sean Meyn

PDF

Open Access

TL;DR

This paper develops a convex formulation of Q-learning for Markov decision processes with function approximation, providing new algorithms with convergence guarantees and applications to inventory control.

Contribution

It introduces a convex relaxation of Q-learning, analyzes its properties, and proposes convergent, model-free algorithms with variance reduction techniques.

Findings

01

Bounded solutions under simple basis function conditions

02

Convergence of the proposed algorithms with rate analysis

03

Application demonstrated on inventory control problem

Abstract

The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation. The algorithms and theory rest on a relaxation of a dual of Manne's celebrated linear programming characterization of optimal control. The main contributions firstly concern properties of the relaxation, described as a deterministic convex program: we identify conditions for a bounded solution, and a significant relationship between the solution to the new convex program, and the solution to standard Q-learning. The second set of contributions concern algorithm design and analysis: (i) A direct model-free method for approximating the convex program for Q-learning shares properties with its ideal. In particular, a bounded solution is ensured subject to a simple property of the basis functions; (ii) The proposed algorithms are convergent and new techniques are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Reinforcement Learning in Robotics · Risk and Portfolio Optimization

MethodsQ-Learning