# The Virtues of Laziness in Model-based RL: A Unified Objective and   Algorithms

**Authors:** Anirudh Vemula, Yuda Song, Aarti Singh, J. Andrew Bagnell, Sanjiban, Choudhury

arXiv: 2303.00694 · 2023-03-02

## TL;DR

This paper introduces a 'lazy' approach to Model-based Reinforcement Learning that unifies policy optimization and model fitting, significantly improving computational efficiency and alignment with true dynamics.

## Contribution

It proposes a novel unified objective and algorithms that enhance efficiency and consistency in model-based RL by aligning model fitting with policy performance.

## Key findings

- Significant computational speedup over traditional planning methods.
- Improved statistical performance demonstrated on simulated benchmarks.
- Unified objective aligns model fitting with policy evaluation.

## Abstract

We propose a novel approach to addressing two fundamental challenges in Model-based Reinforcement Learning (MBRL): the computational expense of repeatedly finding a good policy in the learned model, and the objective mismatch between model fitting and policy computation. Our "lazy" method leverages a novel unified objective, Performance Difference via Advantage in Model, to capture the performance difference between the learned policy and expert policy under the true dynamics. This objective demonstrates that optimizing the expected policy advantage in the learned model under an exploration distribution is sufficient for policy computation, resulting in a significant boost in computational efficiency compared to traditional planning methods. Additionally, the unified objective uses a value moment matching term for model fitting, which is aligned with the model's usage during policy computation. We present two no-regret algorithms to optimize the proposed objective, and demonstrate their statistical and computational gains compared to existing MBRL methods through simulated benchmarks.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2303.00694/full.md

## Figures

18 figures with captions in the complete paper: https://tomesphere.com/paper/2303.00694/full.md

## References

52 references — full list in the complete paper: https://tomesphere.com/paper/2303.00694/full.md

---
Source: https://tomesphere.com/paper/2303.00694