Benchmarking Reinforcement Learning via Stochastic Converse Optimality: Generating Systems with Known Optimal Policies

Sinan Ibrahim; Gr\'egoire Ouerdane; Hadi Salloum; Henni Ouerdane; Stefan Streif; Pavel Osinenko

arXiv:2603.17631·cs.LG·March 19, 2026

Benchmarking Reinforcement Learning via Stochastic Converse Optimality: Generating Systems with Known Optimal Policies

Sinan Ibrahim, Gr\'egoire Ouerdane, Hadi Salloum, Henni Ouerdane, Stefan Streif, Pavel Osinenko

PDF

Open Access

TL;DR

This paper introduces a rigorous benchmarking framework for reinforcement learning that constructs controlled environments with known optimal policies, enabling precise evaluation of algorithms across diverse scenarios.

Contribution

It extends converse optimality to nonlinear stochastic systems, allowing systematic generation of benchmark environments with guaranteed optimal policies.

Findings

01

Framework enables controlled environment generation

02

Demonstrates capacity for comprehensive RL evaluation

03

Provides a reproducible basis for benchmarking

Abstract

The objective comparison of Reinforcement Learning (RL) algorithms is notoriously complex as outcomes and benchmarking of performances of different RL approaches are critically sensitive to environmental design, reward structures, and stochasticity inherent in both algorithmic learning and environmental dynamics. To manage this complexity, we introduce a rigorous benchmarking framework by extending converse optimality to discrete-time, control-affine, nonlinear systems with noise. Our framework provides necessary and sufficient conditions, under which a prescribed value function and policy are optimal for constructed systems, enabling the systematic generation of benchmark families via homotopy variations and randomized parameters. We validate it by automatically constructing diverse environments, demonstrating our framework's capacity for a controlled and comprehensive evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms · Evolutionary Algorithms and Applications