Temporal Abstraction in Reinforcement Learning with Offline Data

Ranga Shaarad Ayyagari; Anurita Ghosh; Ambedkar Dukkipati

arXiv:2407.15241·cs.LG·July 23, 2024

Temporal Abstraction in Reinforcement Learning with Offline Data

Ranga Shaarad Ayyagari, Anurita Ghosh, Ambedkar Dukkipati

PDF

Open Access

TL;DR

This paper introduces an offline hierarchical reinforcement learning method that learns options from existing datasets, addressing the challenge of high sample complexity and distribution mismatch in complex, long-term planning tasks.

Contribution

It presents the first framework for offline hierarchical RL that learns options from datasets collected by unknown policies, enabling training without online interaction.

Findings

01

Effective in MuJoCo locomotion environments

02

Successful in robotic block-stacking tasks

03

Works in transfer and goal-conditioned settings

Abstract

Standard reinforcement learning algorithms with a single policy perform poorly on tasks in complex environments involving sparse rewards, diverse behaviors, or long-term planning. This led to the study of algorithms that incorporate temporal abstraction by training a hierarchy of policies that plan over different time scales. The options framework has been introduced to implement such temporal abstraction by learning low-level options that act as extended actions controlled by a high-level policy. The main challenge in applying these algorithms to real-world problems is that they suffer from high sample complexity to train multiple levels of the hierarchy, which is impossible in online settings. Motivated by this, in this paper, we propose an offline hierarchical RL method that can learn options from existing offline datasets collected by other unknown agents. This is a very challenging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInnovation Diffusion and Forecasting · Reinforcement Learning in Robotics