Metareasoning in uncertain environments: a meta-BAMDP framework

Prakhar Godara; Tilman Diego Alem\'an

arXiv:2408.01253·cs.AI·February 12, 2026

Metareasoning in uncertain environments: a meta-BAMDP framework

Prakhar Godara, Tilman Diego Alem\'an

PDF

TL;DR

This paper introduces a meta-BAMDP framework for metareasoning in uncertain environments, extending traditional models to handle unknown reward and transition distributions, and applies it to Bernoulli bandit tasks with novel theoretical insights.

Contribution

It generalizes metareasoning models by proposing the meta-BAMDP framework for unknown environments and introduces two theorems that improve problem tractability and approximation quality.

Findings

01

The framework applies to Bernoulli bandit tasks.

02

Theorems enhance problem tractability and robustness.

03

Provides testable predictions for human exploration behavior.

Abstract

\textit{Reasoning} may be viewed as an algorithm $P$ that makes a choice of an action $a^{*} \in A$ , aiming to optimize some outcome. However, executing $P$ itself bears costs (time, energy, limited capacity, etc.) and needs to be considered alongside explicit utility obtained by making the choice in the underlying decision problem. Finding the right $P$ can itself be framed as an optimization problem over the space of reasoning processes $P$ , generally referred to as \textit{metareasoning}. Conventionally, human metareasoning models assume that the agent knows the transition and reward distributions of the underlying MDP. This paper generalizes such models by proposing a meta Bayes-Adaptive MDP (meta-BAMDP) framework to handle metareasoning in environments with unknown reward/transition distributions, which encompasses a far larger and more realistic set of planning problems…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training