On the Role of Weight Sharing During Deep Option Learning
Matthew Riemer, Ignacio Cases, Clemens Rosenbaum, Miao Liu, Gerald, Tesauro

TL;DR
This paper investigates the impact of weight sharing in deep option learning within reinforcement learning, revealing that relaxing the independence assumption can improve training stability and speed, especially in complex environments like Atari games.
Contribution
It introduces new algorithms that optimize the full option-critic architecture with shared parameters, challenging previous assumptions of parameter independence.
Findings
Improved training stability in deep option learning.
Faster convergence in Atari game experiments.
Enhanced sample efficiency with shared parameters.
Abstract
The options framework is a popular approach for building temporally extended actions in reinforcement learning. In particular, the option-critic architecture provides general purpose policy gradient theorems for learning actions from scratch that are extended in time. However, past work makes the key assumption that each of the components of option-critic has independent parameters. In this work we note that while this key assumption of the policy gradient theorems of option-critic holds in the tabular case, it is always violated in practice for the deep function approximation setting. We thus reconsider this assumption and consider more general extensions of option-critic and hierarchical option-critic training that optimize for the full architecture with each update. It turns out that not assuming parameter independence challenges a belief in prior work that training the policy over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTest
