PRISM: Parallel Reward Integration with Symmetry for MORL
Finn van der Knaap, Kejiang Qian, Zheng Xu, Fengxiang He

TL;DR
PRISM introduces a symmetry-based approach to improve sample efficiency and performance in heterogeneous Multi-Objective Reinforcement Learning by aligning reward channels and constraining policy search.
Contribution
It proposes ReSymNet and SymReg to address temporal-frequency mismatches and enforce reflectional symmetry, enhancing learning and generalization in MORL.
Findings
Outperforms baseline and oracle methods in MuJoCo benchmarks
Achieves over 100% hypervolume gains compared to sparse-reward baseline
Improves Pareto coverage and distributional balance
Abstract
This work studies heterogeneous Multi-Objective Reinforcement Learning (MORL), where objectives can differ sharply in temporal frequency. Such heterogeneity allows dense objectives to dominate learning, while sparse long-horizon rewards receive weak credit assignment, leading to poor sample efficiency. We propose a Parallel Reward Integration with Symmetry (PRISM) algorithm that enforces reflectional symmetry as an inductive bias in aligning reward channels. PRISM introduces ReSymNet, a theory-motivated model that reconciles temporal-frequency mismatches across objectives, using residual blocks to learn a scaled opportunity value that accelerates exploration while preserving the optimal policy. We also propose SymReg, a reflectional equivariance regulariser that enforces agent mirroring and constrains policy search to a reflection-equivariant subspace. This restriction provably reduces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning
