Synthetic Monitoring Environments for Reinforcement Learning
Leonard Pleiss, Carolin Schmidt, Maximilian Schiffer

TL;DR
This paper introduces Synthetic Monitoring Environments (SMEs), a flexible and transparent suite of continuous control tasks with known optimal policies, enabling precise diagnostics and systematic evaluation of reinforcement learning algorithms.
Contribution
The paper presents SMEs, a novel framework providing configurable environments with ground-truth optimality metrics for rigorous RL evaluation and analysis.
Findings
SMEs enable exact calculation of instantaneous regret.
Environmental properties significantly impact RL performance.
SMEs facilitate systematic WD and OOD evaluation.
Abstract
Reinforcement Learning (RL) lacks benchmarks that enable precise, white-box diagnostics of agent behavior. Current environments often entangle complexity factors and lack ground-truth optimality metrics, making it difficult to isolate why algorithms fail. We introduce Synthetic Monitoring Environments (SMEs), an infinite suite of continuous control tasks. SMEs provide fully configurable task characteristics and known optimal policies. As such, SMEs allow for the exact calculation of instantaneous regret. Their rigorous geometric state space bounds allow for systematic within-distribution (WD) and out-of-distribution (OOD) evaluation. We demonstrate the framework's benefit through multidimensional ablations of PPO, TD3, and SAC, revealing how specific environmental properties - such as action or state space size, reward sparsity and complexity of the optimal policy - impact WD and OOD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning
