Active Exploration via Experiment Design in Markov Chains
Mojm\'ir Mutn\'y, Tadeusz Janik, Andreas Krause

TL;DR
This paper introduces extsc{markov-design}, an algorithm for optimal experiment design in Markov chain settings, enabling adaptive policy selection with proven convergence to the best measurement allocation.
Contribution
We develop a sequential, adaptive algorithm for experiment design in Markov chains that guarantees convergence to the optimal policy, applicable to various real-world domains.
Findings
Algorithm provably converges to optimal measurement policy
Demonstrated effectiveness in ecological surveillance
Validated in pharmacology applications
Abstract
A key challenge in science and engineering is to design experiments to learn about some unknown quantity of interest. Classical experimental design optimally allocates the experimental budget to maximize a notion of utility (e.g., reduction in uncertainty about the unknown quantity). We consider a rich setting, where the experiments are associated with states in a {\em Markov chain}, and we can only choose them by selecting a {\em policy} controlling the state transitions. This problem captures important applications, from exploration in reinforcement learning to spatial monitoring tasks. We propose an algorithm -- \textsc{markov-design} -- that efficiently selects policies whose measurement allocation \emph{provably converges to the optimal one}. The algorithm is sequential in nature, adapting its choice of policies (experiments) informed by past measurements. In addition to our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Gene Regulatory Network Analysis · Machine Learning and Algorithms
