Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization
Yufei Kuang, Miao Lu, Jie Wang, Qi Zhou, Bin Li, Houqiang Li

TL;DR
This paper introduces SCPO, a model-free reinforcement learning algorithm that learns robust policies against unknown disturbances in transition dynamics without prior disturbance modeling or specialized simulators.
Contribution
The paper proposes a novel state-conservative policy optimization (SCPO) method that effectively handles unknown transition disturbances in a model-free manner.
Findings
SCPO outperforms baseline methods in robot control tasks.
SCPO does not require prior disturbance knowledge or specialized simulators.
SCPO is simple to implement and effective against transition disturbances.
Abstract
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to the discrepancy between source and target environments. This discrepancy is commonly viewed as the disturbance in transition dynamics. Many existing algorithms learn robust policies by modeling the disturbance and applying it to source environments during training, which usually requires prior knowledge about the disturbance and control of simulators. However, these algorithms can fail in scenarios where the disturbance from target environments is unknown or is intractable to model in simulators. To tackle this problem, we propose a novel model-free actor-critic algorithm -- namely, state-conservative policy optimization (SCPO) -- to learn robust policies without modeling the disturbance in advance. Specifically, SCPO reduces the disturbance in transition dynamics to that in state space and then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Adaptive Dynamic Programming Control
