Towards Data-Driven Offline Simulations for Online Reinforcement Learning
Shengpu Tang, Felipe Vieira Frujeri, Dipendra Misra, Alex Lamb, John, Langford, Paul Mineiro, Sebastian Kochman

TL;DR
This paper introduces a new offline simulation method for reinforcement learning that improves the evaluation of adaptive agents using high-dimensional data, aiming to enhance safe deployment in real-world systems.
Contribution
It formalizes offline learner simulation for RL and proposes a semi-parametric approach leveraging latent state discovery for accurate, efficient offline simulations.
Findings
Semi-parametric approach outperforms non-parametric baselines
Improved fidelity and efficiency in offline RL simulation
Preliminary experiments validate the approach's advantages
Abstract
Modern decision-making systems, from robots to web recommendation engines, are expected to adapt: to user preferences, changing circumstances or even new tasks. Yet, it is still uncommon to deploy a dynamically learning agent (rather than a fixed policy) to a production system, as it's perceived as unsafe. Using historical data to reason about learning algorithms, similar to offline policy evaluation (OPE) applied to fixed policies, could help practitioners evaluate and ultimately deploy such adaptive agents to production. In this work, we formalize offline learner simulation (OLS) for reinforcement learning (RL) and propose a novel evaluation protocol that measures both fidelity and efficiency of the simulation. For environments with complex high-dimensional observations, we propose a semi-parametric approach that leverages recent advances in latent state discovery in order to achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Explainable Artificial Intelligence (XAI)
