Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning
Finn Rietz, Erik Schaffernicht, Stefan Heinrich, Johannes Andreas, Stork

TL;DR
This paper introduces PSQD, a novel algorithm for lexicographic multi-objective reinforcement learning that efficiently learns, reuses, and adapts subtask solutions in continuous spaces without conflicting priorities.
Contribution
The paper proposes prioritized soft Q-decomposition (PSQD), enabling zero-shot reuse and offline adaptation of subtask solutions under lexicographic priorities in continuous RL tasks.
Findings
Successful learning and adaptation in robot control tasks
Effective reuse of subtask solutions without additional environment interaction
Maintains subtask priorities during learning, outperforming baselines
Abstract
Reinforcement learning (RL) for complex tasks remains a challenge, primarily due to the difficulties of engineering scalar reward functions and the inherent inefficiency of training models from scratch. Instead, it would be better to specify complex tasks in terms of elementary subtasks and to reuse subtask solutions whenever possible. In this work, we address continuous space lexicographic multi-objective RL problems, consisting of prioritized subtasks, which are notoriously difficult to solve. We show that these can be scalarized with a subtask transformation and then solved incrementally using value decomposition. Exploiting this insight, we propose prioritized soft Q-decomposition (PSQD), a novel algorithm for learning and adapting subtask solutions under lexicographic priorities in continuous state-action spaces. PSQD offers the ability to reuse previously learned subtask solutions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Software Engineering Research
