Environment Probing Interaction Policies
Wenxuan Zhou, Lerrel Pinto, Abhinav Gupta

TL;DR
This paper introduces the Environment-Probing Interaction (EPI) policy, which actively explores new environments to gather specific information, enabling task policies to adapt and perform better in varied environments, surpassing traditional invariant approaches.
Contribution
The paper proposes a novel environment probing policy that extracts environment-specific information to improve generalization, using a transition predictability reward for training.
Findings
EPI policies outperform standard generalization methods in novel environments.
Environment-specific probing improves task performance.
Transition predictability effectively guides environment understanding.
Abstract
A key challenge in reinforcement learning (RL) is environment generalization: a policy trained to solve a task in one environment often fails to solve the same task in a slightly different test environment. A common approach to improve inter-environment transfer is to learn policies that are invariant to the distribution of testing environments. However, we argue that instead of being invariant, the policy should identify the specific nuances of an environment and exploit them to achieve better performance. In this work, we propose the 'Environment-Probing' Interaction (EPI) policy, a policy that probes a new environment to extract an implicit understanding of that environment's behavior. Once this environment-specific information is obtained, it is used as an additional input to a task-specific policy that can now perform environment-conditioned actions to solve a task. To learn these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Machine Learning and Data Classification
