Autonomous exploration for navigating in non-stationary CMPs
Pratik Gajane, Ronald Ortner, Peter Auer, Csaba Szepesvari

TL;DR
This paper introduces a new framework for autonomous navigation in non-stationary controlled Markov processes, proposing a performance measure and a meta-algorithm with theoretical guarantees on exploration efficiency amid abrupt environment changes.
Contribution
It presents a novel performance measure and a meta-algorithm for navigation in non-stationary CMPs, with proven bounds on exploration steps related to environment changes.
Findings
Proposed the exploration steps as a new performance measure.
Developed the MNM meta-algorithm for non-stationary environments.
Proved upper bounds on exploration steps based on change count.
Abstract
We consider a setting in which the objective is to learn to navigate in a controlled Markov process (CMP) where transition probabilities may abruptly change. For this setting, we propose a performance measure called exploration steps which counts the time steps at which the learner lacks sufficient knowledge to navigate its environment efficiently. We devise a learning meta-algorithm, MNM and prove an upper bound on the exploration steps in terms of the number of changes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Data Stream Mining Techniques
