Learning Hidden Subgoals under Temporal Ordering Constraints in Reinforcement Learning
Duo Xu, Faramarz Fekri

TL;DR
This paper introduces LSTOC, a reinforcement learning algorithm that learns hidden subgoals and their temporal orderings using contrastive learning, enhancing efficiency and generalization in complex tasks with temporal constraints.
Contribution
LSTOC is the first method to simultaneously learn hidden subgoals and their temporal orderings in RL using a contrastive learning approach and subgoal trees.
Findings
LSTOC outperforms baseline methods in various environments.
It improves sample efficiency in discovering subgoals.
It generalizes well to unseen tasks.
Abstract
In real-world applications, the success of completing a task is often determined by multiple key steps which are distant in time steps and have to be achieved in a fixed time order. For example, the key steps listed on the cooking recipe should be achieved one-by-one in the right time order. These key steps can be regarded as subgoals of the task and their time orderings are described as temporal ordering constraints. However, in many real-world problems, subgoals or key states are often hidden in the state space and their temporal ordering constraints are also unknown, which make it challenging for previous RL algorithms to solve this kind of tasks. In order to address this issue, in this work we propose a novel RL algorithm for {\bf l}earning hidden {\bf s}ubgoals under {\bf t}emporal {\bf o}rdering {\bf c}onstraints (LSTOC). We propose a new contrastive learning objective which can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications
MethodsContrastive Learning
