Deep Reinforcement Learning Behavioral Mode Switching Using Optimal Control Based on a Latent Space Objective
Sindre Benjamin Remman, Bj{\o}rn Andreas Kristiansen, Anastasios M., Lekkas

TL;DR
This paper introduces a method to modify deep reinforcement learning policies by controlling their latent space with optimal control, enabling switching between behavioral modes to improve task success.
Contribution
It presents a novel approach to identify and manipulate behavioral modes in RL policies through latent space optimization using optimal control techniques.
Findings
Successfully switches behavioral modes in lunar lander environment
Imposes desired behaviors to convert failures into successes
Provides a new interpretability filter for neural network policies
Abstract
In this work, we use optimal control to change the behavior of a deep reinforcement learning policy by optimizing directly in the policy's latent space. We hypothesize that distinct behavioral patterns, termed behavioral modes, can be identified within certain regions of a deep reinforcement learning policy's latent space, meaning that specific actions or strategies are preferred within these regions. We identify these behavioral modes using latent space dimension-reduction with \ac*{pacmap}. Using the actions generated by the optimal control procedure, we move the system from one behavioral mode to another. We subsequently utilize these actions as a filter for interpreting the neural network policy. The results show that this approach can impose desired behavioral modes in the policy, demonstrated by showing how a failed episode can be made successful and vice versa using the lunar…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGreenhouse Technology and Climate Control · Energy, Environment, Agriculture Analysis · Reinforcement Learning in Robotics
