Off Environment Evaluation Using Convex Risk Minimization
Pulkit Katdare, Shuijing Liu, Katherine Driggs-Campbell

TL;DR
This paper introduces a convex risk minimization approach to estimate and evaluate the performance gap between simulation and real-world environments for reinforcement learning agents, using trajectory data.
Contribution
The paper proposes a novel convex risk minimization algorithm to quantify model mismatch and evaluate RL agent performance across simulated and real environments.
Findings
Effective performance estimation in multiple simulated environments.
Accurate performance prediction for a real robotic arm using remote data.
Convergence rate of the estimator is of order n^{-1/4}.
Abstract
Applying reinforcement learning (RL) methods on robots typically involves training a policy in simulation and deploying it on a robot in the real world. Because of the model mismatch between the real world and the simulator, RL agents deployed in this manner tend to perform suboptimally. To tackle this problem, researchers have developed robust policy learning algorithms that rely on synthetic noise disturbances. However, such methods do not guarantee performance in the target environment. We propose a convex risk minimization algorithm to estimate the model mismatch between the simulator and the target domain using trajectory data from both environments. We show that this estimator can be used along with the simulator to evaluate performance of an RL agents in the target domain, effectively bridging the gap between these two environments. We also show that the convergence rate of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Age of Information Optimization · Advanced Bandit Algorithms Research
