DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling

Shanghaoran Quan

arXiv:2403.01197·cs.CL·April 30, 2024·1 cites

DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling

Shanghaoran Quan

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces DMoERM, a novel Mixture-of-Experts approach for reward modeling in large language models, addressing generalization and noise issues, and demonstrating superior performance and consistency with human preferences.

Contribution

We propose the Double-Layer MoE RM (DMoERM), combining sparse and dense models with task-specific experts to improve reward modeling effectiveness and robustness.

Findings

01

Outperforms advanced generative approaches in human preference alignment

02

Reduces overoptimization in reward modeling

03

Achieves superior consistency with human annotations

Abstract

The performance of the reward model (RM) is a critical factor in improving the effectiveness of the large language model (LLM) during alignment fine-tuning. There remain two challenges in RM training: 1) training the same RM using various categories of data may cause its generalization performance to suffer from multi-task disturbance, and 2) the human annotation consistency rate is generally only $60%$ to $75%$ , causing training data to contain a lot of noise. To tackle these two challenges, we introduced the idea of Mixture-of-Experts (MoE) into the field of RM for the first time. We propose the Double-Layer MoE RM (DMoERM). The outer layer MoE is a sparse model. After classifying an input into task categories, we route it to the corresponding inner layer task-specific model. The inner layer MoE is a dense model. We decompose the specific task into multiple capability dimensions and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

quanshr/dmoerm
pytorchOfficial

Datasets

quanshr/mtmc-rlhf
dataset· 10 dl
10 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical and Computational Modeling · Wine Industry and Tourism · Forecasting Techniques and Applications