Off Environment Evaluation Using Convex Risk Minimization

Pulkit Katdare; Shuijing Liu; Katherine Driggs-Campbell

arXiv:2112.11532·cs.RO·December 23, 2021

Off Environment Evaluation Using Convex Risk Minimization

Pulkit Katdare, Shuijing Liu, Katherine Driggs-Campbell

PDF

Open Access 1 Repo

TL;DR

This paper introduces a convex risk minimization approach to estimate and evaluate the performance gap between simulation and real-world environments for reinforcement learning agents, using trajectory data.

Contribution

The paper proposes a novel convex risk minimization algorithm to quantify model mismatch and evaluate RL agent performance across simulated and real environments.

Findings

01

Effective performance estimation in multiple simulated environments.

02

Accurate performance prediction for a real robotic arm using remote data.

03

Convergence rate of the estimator is of order n^{-1/4}.

Abstract

Applying reinforcement learning (RL) methods on robots typically involves training a policy in simulation and deploying it on a robot in the real world. Because of the model mismatch between the real world and the simulator, RL agents deployed in this manner tend to perform suboptimally. To tackle this problem, researchers have developed robust policy learning algorithms that rely on synthetic noise disturbances. However, such methods do not guarantee performance in the target environment. We propose a convex risk minimization algorithm to estimate the model mismatch between the simulator and the target domain using trajectory data from both environments. We show that this estimator can be used along with the simulator to evaluate performance of an RL agents in the target domain, effectively bridging the gap between these two environments. We also show that the convergence rate of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pulkitkatdare/offenveval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Age of Information Optimization · Advanced Bandit Algorithms Research