BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs
Sammie Katt, Hai Nguyen, Frans A. Oliehoek, Christopher Amato

TL;DR
This paper introduces BADDr, a scalable Bayesian RL method for POMDPs using dropout networks, unifying previous models and demonstrating competitive performance on small and large domains.
Contribution
It presents a representation-agnostic Bayesian RL framework and a novel dropout-based approach that improves scalability in partially observable environments.
Findings
Competitive with state-of-the-art BRL on small domains
Able to solve larger POMDPs effectively
Belief inference is more scalable with dropout networks
Abstract
While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but struggles to scale. To tackle this challenge, BRL frameworks with various prior assumptions have been proposed, with varied success. This work presents a representation-agnostic formulation of BRL under partially observability, unifying the previous models under one theoretical umbrella. To demonstrate its practical significance we also propose a novel derivation, Bayes-Adaptive Deep Dropout rl (BADDr), based on dropout networks. Under this parameterization, in contrast to previous work, the belief over the state and dynamics is a more scalable inference problem. We choose actions through Monte-Carlo tree search…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Adversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
MethodsMonte-Carlo Tree Search · Dropout
