Soft Action Priors: Towards Robust Policy Transfer
Matheus Centa, Philippe Preux

TL;DR
This paper introduces adaptive algorithms that leverage soft action priors, including suboptimal ones, to improve policy transfer in reinforcement learning, demonstrating state-of-the-art results and robustness in both tabular and continuous action settings.
Contribution
It develops novel adaptive methods for utilizing suboptimal action priors in RL, enhancing robustness and performance over existing policy distillation techniques.
Findings
Achieved state-of-the-art performance in tabular experiments.
Improved stability and robustness in continuous action deep RL.
Effectively leverages suboptimal priors for better policy transfer.
Abstract
Despite success in many challenging problems, reinforcement learning (RL) is still confronted with sample inefficiency, which can be mitigated by introducing prior knowledge to agents. However, many transfer techniques in reinforcement learning make the limiting assumption that the teacher is an expert. In this paper, we use the action prior from the Reinforcement Learning as Inference framework - that is, a distribution over actions at each state which resembles a teacher policy, rather than a Bayesian prior - to recover state-of-the-art policy distillation techniques. Then, we propose a class of adaptive methods that can robustly exploit action priors by combining reward shaping and auxiliary regularization losses. In contrast to prior work, we develop algorithms for leveraging suboptimal action priors that may nevertheless impart valuable knowledge - which we call soft action priors.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Fuel Cells and Related Materials · Adversarial Robustness in Machine Learning
