Min Max Generalization for Two-stage Deterministic Batch Mode   Reinforcement Learning: Relaxation Schemes

Raphael Fonteneau; Damien Ernst; Bernard Boigelot; Quentin Louveaux

arXiv:1202.5298·cs.SY·October 31, 2012

Min Max Generalization for Two-stage Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes

Raphael Fonteneau, Damien Ernst, Bernard Boigelot, Quentin Louveaux

PDF

Open Access

TL;DR

This paper investigates the NP-hard minmax optimization problem in deterministic batch mode reinforcement learning, proposing two relaxation schemes that improve upon previous results through theoretical and empirical analysis.

Contribution

It introduces two novel relaxation schemes for the two-stage minmax problem in deterministic batch RL, with theoretical guarantees and empirical validation.

Findings

01

Both relaxation schemes outperform previous methods.

02

The first relaxation is polynomial-time solvable.

03

The second relaxation leads to a conic quadratic programming formulation.

Abstract

We study the minmax optimization problem introduced in [22] for computing policies for batch mode reinforcement learning in a deterministic setting. First, we show that this problem is NP-hard. In the two-stage case, we provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, leads to a conic quadratic programming problem. We also theoretically prove and empirically illustrate that both relaxation schemes provide better results than those given in [22].

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Adaptive Dynamic Programming Control