Situationally Aware Options
Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

TL;DR
This paper introduces a method for learning situationally aware options in hierarchical reinforcement learning, enabling agents to adapt behaviors like vigor and risk-averseness based on context, demonstrated in RoboCup and other domains.
Contribution
It proposes the SAP algorithm for learning reusable, situationally aware options with theoretical convergence guarantees, advancing hierarchical RL's adaptability and robustness.
Findings
Options learn to vary vigor in different scenarios
Agent exhibits human-like behaviors such as time-wasting
Method mitigates feature-based model misspecification
Abstract
Hierarchical abstractions, also known as options -- a type of temporally extended action (Sutton et. al. 1999) that enables a reinforcement learning agent to plan at a higher level, abstracting away from the lower-level details. In this work, we learn reusable options whose parameters can vary, encouraging different behaviors, based on the current situation. In principle, these behaviors can include vigor, defence or even risk-averseness. These are some examples of what we refer to in the broader context as Situational Awareness (SA). We incorporate SA, in the form of vigor, into hierarchical RL by defining and learning situationally aware options in a Probabilistic Goal Semi-Markov Decision Process (PG-SMDP). This is achieved using our Situationally Aware oPtions (SAP) policy gradient algorithm which comes with a theoretical convergence guarantee. We learn reusable options in different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Data Stream Mining Techniques
