Marginalized Importance Sampling for Off-Environment Policy Evaluation
Pulkit Katdare, Nan Jiang, Katherine Driggs-Campbell

TL;DR
This paper introduces a novel marginalized importance sampling method that combines simulation and offline data to accurately evaluate policies before real-world deployment, addressing key challenges in density ratio estimation.
Contribution
It proposes a two-step density ratio learning approach using occupancy in the simulator, improving efficiency and robustness in off-environment policy evaluation.
Findings
Method generalizes well across different Sim2Sim gaps
Achieves accurate policy evaluation in various environments
Demonstrates successful transfer to real-world robotic arm validation
Abstract
Reinforcement Learning (RL) methods are typically sample-inefficient, making it challenging to train and deploy RL-policies in real world robots. Even a robust policy trained in simulation requires a real-world deployment to assess their performance. This paper proposes a new approach to evaluate the real-world performance of agent policies prior to deploying them in the real world. Our approach incorporates a simulator along with real-world offline data to evaluate the performance of any policy using the framework of Marginalized Importance Sampling (MIS). Existing MIS methods face two challenges: (1) large density ratios that deviate from a reasonable range and (2) indirect supervision, where the ratio needs to be inferred indirectly, thus exacerbating estimation error. Our approach addresses these challenges by introducing the target policy's occupancy in the simulator as an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Simulation Techniques and Applications · Ethics and Social Impacts of AI
