Comparison and Unification of Three Regularization Methods in Batch   Reinforcement Learning

Sarah Rathnam; Susan A. Murphy; and Finale Doshi-Velez

arXiv:2109.08134·cs.LG·September 17, 2021

Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning

Sarah Rathnam, Susan A. Murphy, and Finale Doshi-Velez

PDF

Open Access

TL;DR

This paper unifies three regularization methods in batch reinforcement learning into a common framework, revealing how data distribution and MDP structure influence their effectiveness, supported by empirical evaluations across various scenarios.

Contribution

It introduces a unified framework for three regularization methods in batch RL, clarifying their differences and interactions within a common mathematical form.

Findings

01

Regularization methods can be compared within a common weighted average transition matrix framework.

02

The effectiveness of each regularization method depends on the MDP structure and data distribution.

03

Empirical results confirm the theoretical insights across multiple MDPs and data policies.

Abstract

In batch reinforcement learning, there can be poorly explored state-action pairs resulting in poorly learned, inaccurate models and poorly performing associated policies. Various regularization methods can mitigate the problem of learning overly-complex models in Markov decision processes (MDPs), however they operate in technically and intuitively distinct ways and lack a common form in which to compare them. This paper unifies three regularization methods in a common framework -- a weighted average transition matrix. Considering regularization methods in this common form illuminates how the MDP structure and the state-action pair distribution of the batch data set influence the relative performance of regularization methods. We confirm intuitions generated from the common framework by empirical evaluation across a range of MDPs and data collection policies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Data Stream Mining Techniques