TL;DR
This paper introduces a combined inference and deep reinforcement learning framework for solving POMDPs with uncertain models, demonstrated on railway maintenance planning, enhancing robustness and applicability in real-world scenarios.
Contribution
It presents a novel approach integrating Bayesian inference with deep RL for POMDPs, including a hybrid model-based/model-free method and application to railway maintenance.
Findings
The framework effectively infers POMDP parameters from data.
Deep RL solutions are robust to model uncertainty via domain randomization.
Hybrid approaches outperform purely model-free methods in the application.
Abstract
Partially Observable Markov Decision Processes (POMDPs) can model complex sequential decision-making problems under stochastic and uncertain environments. A main reason hindering their broad adoption in real-world applications is the lack of availability of a suitable POMDP model or a simulator thereof. Available solution algorithms, such as Reinforcement Learning (RL), require the knowledge of the transition dynamics and the observation generating process, which are often unknown and non-trivial to infer. In this work, we propose a combined framework for inference and robust solution of POMDPs via deep RL. First, all transition and observation model parameters are jointly inferred via Markov Chain Monte Carlo sampling of a hidden Markov model, which is conditioned on actions, in order to recover full posterior distributions from the available data. The POMDP with uncertain parameters…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
