Safe Policy Exploration Improvement via Subgoals
Brian Angulo, Gregory Gorbov, Aleksandr Panov, Konstantin Yakovlev

TL;DR
This paper introduces SPEIS, a novel reinforcement learning algorithm that decomposes tasks into subgoals to improve safe exploration and performance in autonomous navigation tasks with safety constraints.
Contribution
The paper proposes a learnable, end-to-end method that combines subgoal generation with safety-aware policy training, enhancing exploration under safety constraints.
Findings
Outperforms state-of-the-art methods in safety and success rate.
Reduces collision rate significantly in simulated environments.
Maintains high success rates while respecting safety limits.
Abstract
Reinforcement learning is a widely used approach to autonomous navigation, showing potential in various tasks and robotic setups. Still, it often struggles to reach distant goals when safety constraints are imposed (e.g., the wheeled robot is prohibited from moving close to the obstacles). One of the main reasons for poor performance in such setups, which is common in practice, is that the need to respect the safety constraints degrades the exploration capabilities of an RL agent. To this end, we introduce a novel learnable algorithm that is based on decomposing the initial problem into smaller sub-problems via intermediate goals, on the one hand, and respects the limit of the cumulative safety constraints, on the other hand -- SPEIS(Safe Policy Exploration Improvement via Subgoals). It comprises the two coupled policies trained end-to-end: subgoal and safe. The subgoal policy is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference
