Specifying Non-Markovian Rewards in MDPs Using LDL on Finite Traces (Preliminary Version)
Ronen Brafman, Giuseppe De Giacomo, Fabio Patrizi

TL;DR
This paper introduces a novel approach to specify non-Markovian rewards in MDPs using LDLf, enabling more expressive reward functions over finite traces with improved automata construction.
Contribution
It presents a new method leveraging LDLf for specifying non-Markovian rewards and provides an automata construction with minimality and compositionality guarantees.
Findings
Automata construction for LDLf-based rewards is efficient and minimal.
The approach extends previous models with strong theoretical guarantees.
Enables expressive reward specifications over finite traces.
Abstract
In Markov Decision Processes (MDPs), the reward obtained in a state depends on the properties of the last state and action. This state dependency makes it difficult to reward more interesting long-term behaviors, such as always closing a door after it has been opened, or providing coffee only following a request. Extending MDPs to handle such non-Markovian reward function was the subject of two previous lines of work, both using variants of LTL to specify the reward function and then compiling the new model back into a Markovian model. Building upon recent progress in the theories of temporal logics over finite traces, we adopt LDLf for specifying non-Markovian rewards and provide an elegant automata construction for building a Markovian model, which extends that of previous work and offers strong minimality and compositionality guarantees.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification · semigroups and automata theory · Logic, programming, and type systems
