About Time: Model-free Reinforcement Learning with Timed Reward Machines
Rajarshi Roy, Anirban Majumdar, Ritam Raha, David Parker, Marta Kwiatkowska

TL;DR
This paper introduces timed reward machines (TRMs), extending reward machines with timing constraints to enable time-sensitive reinforcement learning, and demonstrates their effectiveness in learning policies that satisfy timing specifications.
Contribution
The paper proposes TRMs that incorporate timing constraints into reward structures and develops model-free RL algorithms to learn policies with these constraints.
Findings
Algorithms achieve high rewards while satisfying timing constraints.
Performance varies under different TRM semantics, showing the importance of timing.
Counterfactual-imagining heuristics improve learning efficiency.
Abstract
Reward specification plays a central role in reinforcement learning (RL), guiding the agent's behavior. To express non-Markovian rewards, formalisms such as reward machines have been introduced to capture dependencies on histories. However, traditional reward machines lack the ability to model precise timing constraints, limiting their use in time-sensitive applications. In this paper, we propose timed reward machines (TRMs), which are an extension of reward machines that incorporate timing constraints into the reward structure. TRMs enable more expressive specifications with tunable reward logic, for example, imposing costs for delays and granting rewards for timely actions. We study model-free RL frameworks (i.e., tabular Q-learning) for learning optimal policies with TRMs under digital and real-time semantics. Our algorithms integrate the TRM into learning via abstractions of timed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
