Confounding Robust Continuous Control via Automatic Reward Shaping
Mateo Juliani, Mingxuan Li, Elias Bareinboim

TL;DR
This paper introduces a method to automatically learn reward shaping functions for continuous control in reinforcement learning, robust to unobserved confounders, using causal inference techniques and tested with strong results on benchmark tasks.
Contribution
It presents a novel approach to automatically learn confounding-robust reward shaping functions from offline data using causal Bellman equations and potential-based reward shaping.
Findings
Strong performance on continuous control benchmarks
Robustness to unobserved confounders demonstrated
First causal perspective approach in this context
Abstract
Reward shaping has been applied widely to accelerate Reinforcement Learning (RL) agents' training. However, a principled way of designing effective reward shaping functions, especially for complex continuous control problems, remains largely under-explained. In this work, we propose to automatically learn a reward shaping function for continuous control problems from offline datasets, potentially contaminated by unobserved confounding variables. Specifically, our method builds upon the recently proposed causal Bellman equation to learn a tight upper bound on the optimal state values, which is then used as the potentials in the Potential-Based Reward Shaping (PBRS) framework. Our proposed reward shaping algorithm is tested with Soft-Actor-Critic (SAC) on multiple commonly used continuous control benchmarks and exhibits strong performance guarantees under unobserved confounders. More…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Neurological disorders and treatments
