TL;DR
This paper introduces an adaptive method for multi-objective reinforcement learning that dynamically balances stability and non-convex solution discovery, improving policy optimization in complex robotic tasks.
Contribution
It proposes a novel conflict-driven controller that modulates the optimization landscape's curvature, enabling better exploration of non-convex Pareto fronts in deep RL.
Findings
Enables discovery of Pareto-optimal policies in non-convex regions.
Improves stability and robustness over static scalarization methods.
Validated on a robotic visual search task with positive results.
Abstract
Multi-objective reinforcement learning in robotic domains requires balancing complex, non-convex trade-offs between conflicting objectives. While linear scalarization methods provide stability, they are theoretically incapable of recovering solutions within non-convex regions of the Pareto front. Conversely, static non-linear scalarizations (e.g., Tchebycheff) can theoretically access these regions but often suffer from severe gradient variance and optimization instability in deep RL. In this work, we propose an Adaptive Smooth Tchebycheff framework that resolves this tension by dynamically modulating the curvature of the optimization landscape. We introduce a novel conflict-driven controller that regulates the optimization smoothness based on real-time gradient interference. This allows the agent to anneal toward precise, non-convex scalarization when objectives align, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
