Metareasoning in uncertain environments: a meta-BAMDP framework
Prakhar Godara, Tilman Diego Alem\'an

TL;DR
This paper introduces a meta-BAMDP framework for metareasoning in uncertain environments, extending traditional models to handle unknown reward and transition distributions, and applies it to Bernoulli bandit tasks with novel theoretical insights.
Contribution
It generalizes metareasoning models by proposing the meta-BAMDP framework for unknown environments and introduces two theorems that improve problem tractability and approximation quality.
Findings
The framework applies to Bernoulli bandit tasks.
Theorems enhance problem tractability and robustness.
Provides testable predictions for human exploration behavior.
Abstract
\textit{Reasoning} may be viewed as an algorithm that makes a choice of an action , aiming to optimize some outcome. However, executing itself bears costs (time, energy, limited capacity, etc.) and needs to be considered alongside explicit utility obtained by making the choice in the underlying decision problem. Finding the right can itself be framed as an optimization problem over the space of reasoning processes , generally referred to as \textit{metareasoning}. Conventionally, human metareasoning models assume that the agent knows the transition and reward distributions of the underlying MDP. This paper generalizes such models by proposing a meta Bayes-Adaptive MDP (meta-BAMDP) framework to handle metareasoning in environments with unknown reward/transition distributions, which encompasses a far larger and more realistic set of planning problems…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training
