PrefMoE: Robust Preference Modeling with Mixture-of-Experts Reward Learning
Ziqin Yuan, Ruiqi Wang, Dezhong Zhao, Baijian Yang, Byung-Cheol Min

TL;DR
PrefMoE introduces a mixture-of-experts framework for preference modeling in reinforcement learning, enhancing robustness to heterogeneous and conflicting preference data.
Contribution
It proposes a novel mixture-of-experts reward learning method with adaptive trajectory-level routing and load-balancing, improving robustness over single reward models.
Findings
PrefMoE outperforms single-model baselines in preference prediction accuracy.
PrefMoE leads to more reliable policy learning in locomotion and manipulation tasks.
The framework effectively captures diverse latent preferences under noisy supervision.
Abstract
Preference-based reinforcement learning offers a scalable alternative to manual reward engineering by learning reward structures from comparative feedback. However, large-scale preference datasets, whether collected from crowdsourced annotators or generated by synthetic teachers, often contain heterogeneous and partially conflicting supervision, including disagreement across annotators and inconsistency within annotators. Existing reward learning methods typically fit a single reward model to such data, forcing it to average incompatible signals and thereby limiting robustness. To solve this, we propose PrefMoE, a mixture-of-experts reward learning framework for robust preference modeling. PrefMoE learns multiple specialized reward experts and uses trajectory-level soft routing to combine them adaptively, enabling the model to capture diverse latent preference patterns under noisy and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
