TL;DR
This paper introduces a method to interpolate optimal value functions in multi-objective reinforcement learning, enabling quick adaptation to changing preferences without recomputing solutions, demonstrated through various examples.
Contribution
It proves the smooth transformation of value functions with respect to reward weights and applies Gaussian process interpolation to efficiently approximate solutions.
Findings
Interpolation provides robust value estimates for sample states.
Method enables instant preference adaptation in autonomous vehicles.
Effective in both discrete and continuous domains.
Abstract
A common approach for defining a reward function for Multi-objective Reinforcement Learning (MORL) problems is the weighted sum of the multiple objectives. The weights are then treated as design parameters dependent on the expertise (and preference) of the person performing the learning, with the typical result that a new solution is required for any change in these settings. This paper investigates the relationship between the reward function and the optimal value function for MORL; specifically addressing the question of how to approximate the optimal value function well beyond the set of weights for which the optimization problem was actually solved, thereby avoiding the need to recompute for any particular choice. We prove that the value function transforms smoothly given a transformation of weights of the reward function (and thus a smooth interpolation in the policy space). A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGaussian Process
