$\kappa$-Explorer: A Unified Framework for Active Model Estimation in MDPs
Xihe Gu, Urbashi Mitra, Tara Javidi

TL;DR
The paper introduces $ppa$-Explorer, a unified active exploration framework for MDPs that optimally balances exploration and estimation accuracy using a novel objective function, with proven guarantees and practical algorithms.
Contribution
It proposes a new family of objective functions $U_ppa$ for exploration, unifying various goals and enabling an efficient Frank-Wolfe-based algorithm for active model estimation in MDPs.
Findings
$ppa$-Explorer outperforms existing exploration methods in benchmark MDPs.
The framework provides tight regret guarantees for active exploration.
A practical online surrogate algorithm is developed for real-world applications.
Abstract
In tabular Markov decision processes (MDPs) with perfect state observability, each trajectory provides active samples from the transition distributions conditioned on state-action pairs. Consequently, accurate model estimation depends on how the exploration policy allocates visitation frequencies in accordance with the intrinsic complexity of each transition distribution. Building on recent work on coverage-based exploration, we introduce a parameterized family of decomposable and concave objective functions that explicitly incorporate both intrinsic estimation complexity and extrinsic visitation frequency. Moreover, the curvature provides a unified treatment of various global objectives, such as the average-case and worst-case estimation error objectives. Using the closed-form characterization of the gradient of , we propose -Explorer, an active…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference
