Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning
Kai Wang, Sanket Shah, Haipeng Chen, Andrew Perrault, Finale, Doshi-Velez, Milind Tambe

TL;DR
This paper introduces a decision-focused learning approach for sequential decision problems modeled as MDPs, using differentiable approximations to improve generalization in reinforcement learning settings.
Contribution
It develops methods to differentiate through MDPs with large state-action spaces, enabling decision-focused training that enhances generalization to unseen tasks.
Findings
Decision-focused learning outperforms traditional methods in generalization.
Sampling unbiased derivatives effectively approximates optimality conditions.
Low-rank approximations reduce computational costs in high-dimensional derivatives.
Abstract
In the predict-then-optimize framework, the objective is to train a predictive model, mapping from environment features to parameters of an optimization problem, which maximizes decision quality when the optimization is subsequently solved. Recent work on decision-focused learning shows that embedding the optimization problem in the training pipeline can improve decision quality and help generalize better to unseen tasks compared to relying on an intermediate loss function for evaluating prediction quality. We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) that are solved via reinforcement learning. In particular, we are given environment features and a set of trajectories from training MDPs, which we use to train a predictive model that generalizes to unseen test MDPs without trajectories. Two significant computational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms
