DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling
Shanghaoran Quan

TL;DR
This paper introduces DMoERM, a novel Mixture-of-Experts approach for reward modeling in large language models, addressing generalization and noise issues, and demonstrating superior performance and consistency with human preferences.
Contribution
We propose the Double-Layer MoE RM (DMoERM), combining sparse and dense models with task-specific experts to improve reward modeling effectiveness and robustness.
Findings
Outperforms advanced generative approaches in human preference alignment
Reduces overoptimization in reward modeling
Achieves superior consistency with human annotations
Abstract
The performance of the reward model (RM) is a critical factor in improving the effectiveness of the large language model (LLM) during alignment fine-tuning. There remain two challenges in RM training: 1) training the same RM using various categories of data may cause its generalization performance to suffer from multi-task disturbance, and 2) the human annotation consistency rate is generally only to , causing training data to contain a lot of noise. To tackle these two challenges, we introduced the idea of Mixture-of-Experts (MoE) into the field of RM for the first time. We propose the Double-Layer MoE RM (DMoERM). The outer layer MoE is a sparse model. After classifying an input into task categories, we route it to the corresponding inner layer task-specific model. The inner layer MoE is a dense model. We decompose the specific task into multiple capability dimensions and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical and Computational Modeling · Wine Industry and Tourism · Forecasting Techniques and Applications
