Towards Data-Driven Offline Simulations for Online Reinforcement   Learning

Shengpu Tang; Felipe Vieira Frujeri; Dipendra Misra; Alex Lamb; John; Langford; Paul Mineiro; Sebastian Kochman

arXiv:2211.07614·cs.LG·November 15, 2022

Towards Data-Driven Offline Simulations for Online Reinforcement Learning

Shengpu Tang, Felipe Vieira Frujeri, Dipendra Misra, Alex Lamb, John, Langford, Paul Mineiro, Sebastian Kochman

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new offline simulation method for reinforcement learning that improves the evaluation of adaptive agents using high-dimensional data, aiming to enhance safe deployment in real-world systems.

Contribution

It formalizes offline learner simulation for RL and proposes a semi-parametric approach leveraging latent state discovery for accurate, efficient offline simulations.

Findings

01

Semi-parametric approach outperforms non-parametric baselines

02

Improved fidelity and efficiency in offline RL simulation

03

Preliminary experiments validate the approach's advantages

Abstract

Modern decision-making systems, from robots to web recommendation engines, are expected to adapt: to user preferences, changing circumstances or even new tasks. Yet, it is still uncommon to deploy a dynamically learning agent (rather than a fixed policy) to a production system, as it's perceived as unsafe. Using historical data to reason about learning algorithms, similar to offline policy evaluation (OPE) applied to fixed policies, could help practitioners evaluate and ultimately deploy such adaptive agents to production. In this work, we formalize offline learner simulation (OLS) for reinforcement learning (RL) and propose a novel evaluation protocol that measures both fidelity and efficiency of the simulation. For environments with complex high-dimensional observations, we propose a semi-parametric approach that leverages recent advances in latent state discovery in order to achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/rl-offline-simulation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Explainable Artificial Intelligence (XAI)