Reinforcement Learning for Long-Horizon Unordered Tasks: From Boolean to Coupled Reward Machines

Kristina Levina; Nikolaos Pappas; Athanasios Karapantelakis; Aneta Vulgarakis Feljan; Jendrik Seipp

arXiv:2510.27329·cs.AI·November 3, 2025

Reinforcement Learning for Long-Horizon Unordered Tasks: From Boolean to Coupled Reward Machines

Kristina Levina, Nikolaos Pappas, Athanasios Karapantelakis, Aneta Vulgarakis Feljan, Jendrik Seipp

PDF

Open Access

TL;DR

This paper introduces three generalizations of reward machines and a new learning algorithm, CoRM, to improve reinforcement learning efficiency in long-horizon, unordered subtask environments.

Contribution

It proposes Numeric, Agenda, and Coupled Reward Machines, and a novel CoRM algorithm, addressing scalability issues in long-horizon unordered tasks.

Findings

01

CoRM outperforms existing algorithms in long-horizon scenarios.

02

Coupled RMs enable more efficient learning in complex tasks.

03

The approach scales better with the number of unordered subtasks.

Abstract

Reward machines (RMs) inform reinforcement learning agents about the reward structure of the environment. This is particularly advantageous for complex non-Markovian tasks because agents with access to RMs can learn more efficiently from fewer samples. However, learning with RMs is ill-suited for long-horizon problems in which a set of subtasks can be executed in any order. In such cases, the amount of information to learn increases exponentially with the number of unordered subtasks. In this work, we address this limitation by introducing three generalisations of RMs: (1) Numeric RMs allow users to express complex tasks in a compact form. (2) In Agenda RMs, states are associated with an agenda that tracks the remaining subtasks to complete. (3) Coupled RMs have coupled states associated with each subtask in the agenda. Furthermore, we introduce a new compositional learning algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Software Engineering Methodologies · Explainable Artificial Intelligence (XAI)