Learning Control Policies for Variable Objectives from Offline Data

Marc Weber; Phillip Swazinna; Daniel Hein; Steffen Udluft; and Volkmar; Sterzing

arXiv:2308.06127·cs.LG·January 4, 2024

Learning Control Policies for Variable Objectives from Offline Data

Marc Weber, Phillip Swazinna, Daniel Hein, Steffen Udluft, and Volkmar, Sterzing

PDF

Open Access

TL;DR

This paper introduces variable objective policies (VOPs) in offline reinforcement learning, enabling control policies to adapt to different objectives at runtime without additional data or re-training.

Contribution

It extends model-based policy search methods to allow policies to generalize over multiple objectives parameterized by the reward function.

Findings

01

Policies can be adjusted at runtime by changing objectives.

02

No additional data collection needed for different objectives.

03

Enhanced flexibility in control strategies.

Abstract

Offline reinforcement learning provides a viable approach to obtain advanced control strategies for dynamical systems, in particular when direct interaction with the environment is not available. In this paper, we introduce a conceptual extension for model-based policy search methods, called variable objective policy (VOP). With this approach, policies are trained to generalize efficiently over a variety of objectives, which parameterize the reward function. We demonstrate that by altering the objectives passed as input to the policy, users gain the freedom to adjust its behavior or re-balance optimization targets at runtime, without need for collecting additional observation batches or re-training.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms · Advanced Control Systems Optimization