Multi-Objective Policy Gradients with Topological Constraints
Kyle Hollins Wray, Stas Tiomkin, Mykel J. Kochenderfer, Pieter Abbeel

TL;DR
This paper extends topological Markov decision processes to continuous spaces, deriving a policy gradient theorem and implementing a new algorithm that generalizes deep reinforcement learning for multi-objective problems with ordered constraints.
Contribution
It formulates and proves a policy gradient theorem for TMDPs in continuous spaces, enabling new algorithms that incorporate topological constraints into deep RL.
Findings
Successful implementation of the TMDP policy gradient algorithm
Effective navigation in real-world multi-objective robot tasks
Generalization of existing DRL methods to topologically constrained problems
Abstract
Multi-objective optimization models that encode ordered sequential constraints provide a solution to model various challenging problems including encoding preferences, modeling a curriculum, and enforcing measures of safety. A recently developed theory of topological Markov decision processes (TMDPs) captures this range of problems for the case of discrete states and actions. In this work, we extend TMDPs towards continuous spaces and unknown transition dynamics by formulating, proving, and implementing the policy gradient theorem for TMDPs. This theoretical result enables the creation of TMDP learning algorithms that use function approximators, and can generalize existing deep reinforcement learning (DRL) approaches. Specifically, we present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm. We demonstrate this on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms
