Convex Q Learning in a Stochastic Environment: Extended Version
Fan Lu, Sean Meyn

TL;DR
This paper develops a convex formulation of Q-learning for Markov decision processes with function approximation, providing new algorithms with convergence guarantees and applications to inventory control.
Contribution
It introduces a convex relaxation of Q-learning, analyzes its properties, and proposes convergent, model-free algorithms with variance reduction techniques.
Findings
Bounded solutions under simple basis function conditions
Convergence of the proposed algorithms with rate analysis
Application demonstrated on inventory control problem
Abstract
The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation. The algorithms and theory rest on a relaxation of a dual of Manne's celebrated linear programming characterization of optimal control. The main contributions firstly concern properties of the relaxation, described as a deterministic convex program: we identify conditions for a bounded solution, and a significant relationship between the solution to the new convex program, and the solution to standard Q-learning. The second set of contributions concern algorithm design and analysis: (i) A direct model-free method for approximating the convex program for Q-learning shares properties with its ideal. In particular, a bounded solution is ensured subject to a simple property of the basis functions; (ii) The proposed algorithms are convergent and new techniques are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Reinforcement Learning in Robotics · Risk and Portfolio Optimization
MethodsQ-Learning
