A Robust Policy Bootstrapping Algorithm for Multi-objective Reinforcement Learning in Non-stationary Environments
Sherif Abdelfattah, Kathryn Kasmarik, Jiankun Hu

TL;DR
This paper presents a new multi-objective reinforcement learning algorithm that adaptively evolves policy coverage sets in non-stationary environments, outperforming existing methods in dynamic settings.
Contribution
It introduces a robust, online developmental optimization algorithm for evolving policy coverage sets in non-stationary environments, addressing a key limitation of prior methods.
Findings
Outperforms existing algorithms in non-stationary environments
Achieves comparable results to state-of-the-art in stationary environments
Demonstrates robustness and adaptability in dynamic settings
Abstract
Multi-objective Markov decision processes are a special kind of multi-objective optimization problem that involves sequential decision making while satisfying the Markov property of stochastic processes. Multi-objective reinforcement learning methods address this problem by fusing the reinforcement learning paradigm with multi-objective optimization techniques. One major drawback of these methods is the lack of adaptability to non-stationary dynamics in the environment. This is because they adopt optimization procedures that assume stationarity to evolve a coverage set of policies that can solve the problem. This paper introduces a developmental optimization approach that can evolve the policy coverage set while exploring the preference space over the defined objectives in an online manner. We propose a novel multi-objective reinforcement learning algorithm that can robustly evolve a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms · Reinforcement Learning in Robotics
