L*-Based Learning of Markov Decision Processes (Extended Version)
Martin Tappler, Bernhard K. Aichernig, Giovanni Bacci, Maria, Eichlseder, Kim G. Larsen

TL;DR
This paper extends L*-based automata learning techniques to deterministic Markov decision processes, proposing a novel sampling-based algorithm that learns complete model structures and outperforms passive methods in accuracy.
Contribution
It introduces a new L*-based learning algorithm for Markov decision processes that relaxes perfect information assumptions and learns full model structures from sampled traces.
Findings
Sampling-based algorithm achieves higher accuracy than passive methods.
The algorithm learns complete model structures including states.
Experiments validate improved performance with the same test data.
Abstract
Automata learning techniques automatically generate system models from test observations. These techniques usually fall into two categories: passive and active. Passive learning uses a predetermined data set, e.g., system logs. In contrast, active learning actively queries the system under learning, which is considered more efficient. An influential active learning technique is Angluin's L* algorithm for regular languages which inspired several generalisations from DFAs to other automata-based modelling formalisms. In this work, we study L*-based learning of deterministic Markov decision processes, first assuming an ideal setting with perfect information. Then, we relax this assumption and present a novel learning algorithm that collects information by sampling system traces via testing. Experiments with the implementation of our sampling-based algorithm suggest that it achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
