$\kappa$-Explorer: A Unified Framework for Active Model Estimation in MDPs

Xihe Gu; Urbashi Mitra; Tara Javidi

arXiv:2602.20404·cs.LG·February 25, 2026

$\kappa$-Explorer: A Unified Framework for Active Model Estimation in MDPs

Xihe Gu, Urbashi Mitra, Tara Javidi

PDF

Open Access

TL;DR

The paper introduces $ppa$-Explorer, a unified active exploration framework for MDPs that optimally balances exploration and estimation accuracy using a novel objective function, with proven guarantees and practical algorithms.

Contribution

It proposes a new family of objective functions $U_ppa$ for exploration, unifying various goals and enabling an efficient Frank-Wolfe-based algorithm for active model estimation in MDPs.

Findings

01

$ppa$-Explorer outperforms existing exploration methods in benchmark MDPs.

02

The framework provides tight regret guarantees for active exploration.

03

A practical online surrogate algorithm is developed for real-world applications.

Abstract

In tabular Markov decision processes (MDPs) with perfect state observability, each trajectory provides active samples from the transition distributions conditioned on state-action pairs. Consequently, accurate model estimation depends on how the exploration policy allocates visitation frequencies in accordance with the intrinsic complexity of each transition distribution. Building on recent work on coverage-based exploration, we introduce a parameterized family of decomposable and concave objective functions $U_{κ}$ that explicitly incorporate both intrinsic estimation complexity and extrinsic visitation frequency. Moreover, the curvature $κ$ provides a unified treatment of various global objectives, such as the average-case and worst-case estimation error objectives. Using the closed-form characterization of the gradient of $U_{κ}$ , we propose $κ$ -Explorer, an active…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference