Error-Aware Policy Learning: Zero-Shot Generalization in Partially Observable Dynamic Environments
Visak Kumar, Sehoon Ha, C. Karen Liu

TL;DR
This paper presents an error-aware policy learning method enabling zero-shot transfer of policies to new environments with unobservable factors, demonstrated on assistive walking devices and standard RL tasks.
Contribution
The paper introduces an error-aware policy (EAP) that explicitly accounts for unobservable factors, enabling zero-shot generalization in partially observable dynamic environments.
Findings
EAP successfully transfers to different human agents with unseen biomechanics.
The method generalizes to standard RL control tasks.
EAP improves robustness in sim-to-real transfer scenarios.
Abstract
Simulation provides a safe and efficient way to generate useful data for learning complex robotic tasks. However, matching simulation and real-world dynamics can be quite challenging, especially for systems that have a large number of unobserved or unmeasurable parameters, which may lie in the robot dynamics itself or in the environment with which the robot interacts. We introduce a novel approach to tackle such a sim-to-real problem by developing policies capable of adapting to new environments, in a zero-shot manner. Key to our approach is an error-aware policy (EAP) that is explicitly made aware of the effect of unobservable factors during training. An EAP takes as input the predicted future state error in the target environment, which is provided by an error-prediction function, simultaneously trained with the EAP. We validate our approach on an assistive walking device trained to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttentive Walk-Aggregating Graph Neural Network
