Learning Skills to Navigate without a Master: A Sequential Multi-Policy Reinforcement Learning Algorithm
Ambedkar Dukkipati, Rajarshi Banerjee, Ranga Shaarad Ayyagari, Dhaval, Parmar Udaybhai

TL;DR
This paper introduces Sequential Soft Option Critic, a reinforcement learning method that learns skills sequentially without hierarchical policies, improving performance on navigation and goal-based tasks in complex environments.
Contribution
The paper presents a novel sequential learning approach that avoids hierarchical policies, enhancing generalization and performance in complex reinforcement learning tasks.
Findings
Outperforms Soft Actor-Critic and Soft Option Critic on various benchmarks.
Effective in 3D navigation, Atari River Raid, and self-driving car simulations.
Improves skill learning without hierarchical policy structures.
Abstract
Solving complex problems using reinforcement learning necessitates breaking down the problem into manageable tasks and learning policies to solve these tasks. These policies, in turn, have to be controlled by a master policy that takes high-level decisions. Hence learning policies involves hierarchical decision structures. However, training such methods in practice may lead to poor generalization, with either sub-policies executing actions for too few time steps or devolving into a single policy altogether. In our work, we introduce an alternative approach to learn such skills sequentially without using an overarching hierarchical policy. We propose this method in the context of environments where a major component of the objective of a learning agent is to prolong the episode for as long as possible. We refer to our proposed method as Sequential Soft Option Critic. We demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsExperience Replay · *Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · Adam · Soft Actor Critic
