Anti-Exploration by Random Network Distillation
Alexander Nikulin, Vladislav Kurenkov, Denis Tarasov, Sergey, Kolesnikov

TL;DR
This paper improves Random Network Distillation (RND) for offline reinforcement learning by using FiLM conditioning, enabling effective anti-exploration and achieving competitive performance without ensembles.
Contribution
Introducing FiLM conditioning to RND to overcome anti-exploration limitations, resulting in an efficient ensemble-free method for offline RL.
Findings
Achieves performance comparable to ensemble methods on D4RL.
Outperforms other ensemble-free approaches significantly.
Effectively mitigates anti-exploration issues in RND.
Abstract
Despite the success of Random Network Distillation (RND) in various domains, it was shown as not discriminative enough to be used as an uncertainty estimator for penalizing out-of-distribution actions in offline reinforcement learning. In this paper, we revisit these results and show that, with a naive choice of conditioning for the RND prior, it becomes infeasible for the actor to effectively minimize the anti-exploration bonus and discriminativity is not an issue. We show that this limitation can be avoided with conditioning based on Feature-wise Linear Modulation (FiLM), resulting in a simple and efficient ensemble-free algorithm based on Soft Actor-Critic. We evaluate it on the D4RL benchmark, showing that it is capable of achieving performance comparable to ensemble-based methods and outperforming ensemble-free approaches by a wide margin.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsInsect and Arachnid Ecology and Behavior · Insect and Pesticide Research · Animal Behavior and Welfare Studies
