Learning MDPs from Features: Predict-Then-Optimize for Sequential   Decision Problems by Reinforcement Learning

Kai Wang; Sanket Shah; Haipeng Chen; Andrew Perrault; Finale; Doshi-Velez; Milind Tambe

arXiv:2106.03279·cs.LG·July 19, 2022·1 cites

Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning

Kai Wang, Sanket Shah, Haipeng Chen, Andrew Perrault, Finale, Doshi-Velez, Milind Tambe

PDF

Open Access

TL;DR

This paper introduces a decision-focused learning approach for sequential decision problems modeled as MDPs, using differentiable approximations to improve generalization in reinforcement learning settings.

Contribution

It develops methods to differentiate through MDPs with large state-action spaces, enabling decision-focused training that enhances generalization to unseen tasks.

Findings

01

Decision-focused learning outperforms traditional methods in generalization.

02

Sampling unbiased derivatives effectively approximates optimality conditions.

03

Low-rank approximations reduce computational costs in high-dimensional derivatives.

Abstract

In the predict-then-optimize framework, the objective is to train a predictive model, mapping from environment features to parameters of an optimization problem, which maximizes decision quality when the optimization is subsequently solved. Recent work on decision-focused learning shows that embedding the optimization problem in the training pipeline can improve decision quality and help generalize better to unseen tasks compared to relying on an intermediate loss function for evaluating prediction quality. We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) that are solved via reinforcement learning. In particular, we are given environment features and a set of trajectories from training MDPs, which we use to train a predictive model that generalizes to unseen test MDPs without trajectories. Two significant computational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms