Min Max Generalization for Two-stage Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes
Raphael Fonteneau, Damien Ernst, Bernard Boigelot, Quentin Louveaux

TL;DR
This paper investigates the NP-hard minmax optimization problem in deterministic batch mode reinforcement learning, proposing two relaxation schemes that improve upon previous results through theoretical and empirical analysis.
Contribution
It introduces two novel relaxation schemes for the two-stage minmax problem in deterministic batch RL, with theoretical guarantees and empirical validation.
Findings
Both relaxation schemes outperform previous methods.
The first relaxation is polynomial-time solvable.
The second relaxation leads to a conic quadratic programming formulation.
Abstract
We study the minmax optimization problem introduced in [22] for computing policies for batch mode reinforcement learning in a deterministic setting. First, we show that this problem is NP-hard. In the two-stage case, we provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, leads to a conic quadratic programming problem. We also theoretically prove and empirically illustrate that both relaxation schemes provide better results than those given in [22].
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Adaptive Dynamic Programming Control
