Reward Machines for Deep RL in Noisy and Uncertain Environments
Andrew C. Li, Zizhao Chen, Toryn Q. Klassen, Pashootan Vaezipoor,, Rodrigo Toro Icarte, Sheila A. McIlraith

TL;DR
This paper investigates how Reward Machines can be used to improve deep reinforcement learning in environments with noise and uncertainty, by exploiting task structure despite partial observability and noisy sensing.
Contribution
It introduces RL algorithms that leverage Reward Machine structures under uncertain domain interpretations, addressing challenges in noisy, real-world environments.
Findings
Naive approaches fail under noisy conditions.
Structured reward representations improve learning efficiency.
Task structure can be exploited despite noisy domain vocabularies.
Abstract
Reward Machines provide an automaton-inspired structure for specifying instructions, safety constraints, and other temporally extended reward-worthy behaviour. By exposing the underlying structure of a reward function, they enable the decomposition of an RL task, leading to impressive gains in sample efficiency. Although Reward Machines and similar formal specifications have a rich history of application towards sequential decision-making problems, they critically rely on a ground-truth interpretation of the domain-specific vocabulary that forms the building blocks of the reward function--such ground-truth interpretations are elusive in the real world due in part to partial observability and noisy sensing. In this work, we explore the use of Reward Machines for Deep RL in noisy and uncertain environments. We characterize this problem as a POMDP and propose a suite of RL algorithms that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsElectrostatic Discharge in Electronics · Advanced Memory and Neural Computing · Low-power high-performance VLSI design
