Temporal Abstraction in Reinforcement Learning with Offline Data
Ranga Shaarad Ayyagari, Anurita Ghosh, Ambedkar Dukkipati

TL;DR
This paper introduces an offline hierarchical reinforcement learning method that learns options from existing datasets, addressing the challenge of high sample complexity and distribution mismatch in complex, long-term planning tasks.
Contribution
It presents the first framework for offline hierarchical RL that learns options from datasets collected by unknown policies, enabling training without online interaction.
Findings
Effective in MuJoCo locomotion environments
Successful in robotic block-stacking tasks
Works in transfer and goal-conditioned settings
Abstract
Standard reinforcement learning algorithms with a single policy perform poorly on tasks in complex environments involving sparse rewards, diverse behaviors, or long-term planning. This led to the study of algorithms that incorporate temporal abstraction by training a hierarchy of policies that plan over different time scales. The options framework has been introduced to implement such temporal abstraction by learning low-level options that act as extended actions controlled by a high-level policy. The main challenge in applying these algorithms to real-world problems is that they suffer from high sample complexity to train multiple levels of the hierarchy, which is impossible in online settings. Motivated by this, in this paper, we propose an offline hierarchical RL method that can learn options from existing offline datasets collected by other unknown agents. This is a very challenging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovation Diffusion and Forecasting · Reinforcement Learning in Robotics
