VariBASed: Variational Bayes-Adaptive Sequential Monte-Carlo Planning for Deep Reinforcement Learning

Joery A. de Vries; Jinke He; Yaniv Oren; Pascal R. van der Vaart; Mathijs M. de Weerdt; Matthijs T. J. Spaan

arXiv:2602.18857·cs.LG·February 24, 2026

VariBASed: Variational Bayes-Adaptive Sequential Monte-Carlo Planning for Deep Reinforcement Learning

Joery A. de Vries, Jinke He, Yaniv Oren, Pascal R. van der Vaart, Mathijs M. de Weerdt, Matthijs T. J. Spaan

PDF

Open Access

TL;DR

This paper introduces VariBASed, a variational Bayes-adaptive planning method for deep reinforcement learning that improves efficiency by combining belief learning, Monte-Carlo planning, and meta-learning, scalable on single-GPU setups.

Contribution

It proposes a novel variational framework that unifies belief learning, Monte-Carlo planning, and meta-reinforcement learning for scalable Bayes-adaptive RL.

Findings

01

Favorable scaling to larger planning budgets.

02

Improved sample-efficiency over prior methods.

03

Reduced runtime in deep RL planning tasks.

Abstract

Optimally trading-off exploration and exploitation is the holy grail of reinforcement learning as it promises maximal data-efficiency for solving any task. Bayes-optimal agents achieve this, but obtaining the belief-state and performing planning are both typically intractable. Although deep learning methods can greatly help in scaling this computation, existing methods are still costly to train. To accelerate this, this paper proposes a variational framework for learning and planning in Bayes-adaptive Markov decision processes that coalesces variational belief learning, sequential Monte-Carlo planning, and meta-reinforcement learning. In a single-GPU setup, our new method VariBASeD exhibits favorable scaling to larger planning budgets, improving sample- and runtime-efficiency over prior methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms · Explainable Artificial Intelligence (XAI)