Deterministic Uncertainty Propagation for Improved Model-Based Offline   Reinforcement Learning

Abdullah Akg\"ul; Manuel Hau{\ss}mann; Melih Kandemir

arXiv:2406.04088·cs.LG·January 17, 2025

Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning

Abdullah Akg\"ul, Manuel Hau{\ss}mann, Melih Kandemir

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MOMBO, a deterministic method for propagating uncertainty in model-based offline reinforcement learning, which improves convergence speed and provides tighter suboptimality guarantees compared to Monte Carlo sampling methods.

Contribution

The paper proposes MOMBO, a novel deterministic approach using moment matching for uncertainty propagation, reducing sampling variance and enhancing convergence in offline RL.

Findings

01

MOMBO converges faster than Monte Carlo-based methods.

02

Tighter suboptimality guarantees are achieved with MOMBO.

03

MOMBO outperforms existing approaches on benchmark tasks.

Abstract

Current approaches to model-based offline reinforcement learning often incorporate uncertainty-based reward penalization to address the distributional shift problem. These approaches, commonly known as pessimistic value iteration, use Monte Carlo sampling to estimate the Bellman target to perform temporal difference-based policy evaluation. We find out that the randomness caused by this sampling step significantly delays convergence. We present a theoretical result demonstrating the strong dependency of suboptimality on the number of Monte Carlo samples taken per Bellman target calculation. Our main contribution is a deterministic approximation to the Bellman target that uses progressive moment matching, a method developed originally for deterministic variational inference. The resulting algorithm, which we call Moment Matching Offline Model-Based Policy Optimization (MOMBO), propagates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adinlab/MOMBO
pytorchOfficial

Videos

Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning· slideslive

Taxonomy

TopicsFault Detection and Control Systems · Smart Grid Security and Resilience · Software Reliability and Analysis Research

MethodsSparse Evolutionary Training