A sojourn-based approach to semi-Markov Reinforcement Learning
Giacomo Ascione, Salvatore Cuomo

TL;DR
This paper introduces a novel semi-Markov decision process framework leveraging sojourn times, enabling more nuanced decision-making, and evaluates it with $Q$-learning and deep reinforcement learning on toy examples.
Contribution
It presents a new approach to semi-Markov decision processes based on sojourn times, with a numerical $Q$-learning method and comparative evaluations.
Findings
The $Q$-learning algorithm effectively handles sojourn-time-dependent rewards.
Deep reinforcement learning shows promise in semi-Markov environments.
Toy examples demonstrate the approach's applicability and advantages.
Abstract
In this paper we introduce a new approach to discrete-time semi-Markov decision processes based on the sojourn time process. Different characterizations of discrete-time semi-Markov processes are exploited and decision processes are constructed by their means. With this new approach, the agent is allowed to consider different actions depending also on the sojourn time of the process in the current state. A numerical method based on -learning algorithms for finite horizon reinforcement learning and stochastic recursive relations is investigated. Finally, we consider two toy examples: one in which the reward depends on the sojourn-time, according to the gambler's fallacy; the other in which the environment is semi-Markov even if the reward function does not depend on the sojourn time. These are used to carry on some numerical evaluations on the previously presented -learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGame Theory and Applications · Decision-Making and Behavioral Economics · Complex Systems and Time Series Analysis
