RM-Distiller: Exploiting Generative LLM for Reward Model Distillation
Hongli Zhou, Hui Huang, Wei Liu, Chenglong Wang, Xingyuan Bu, Lvyuan Han, Fuhai Song, Muyun Yang, Wenhao Jiang, Hailong Cao, Tiejun Zhao

TL;DR
This paper introduces RM-Distiller, a novel framework that leverages the multifaceted capabilities of generative LLMs to improve reward model distillation, leading to better alignment with human preferences.
Contribution
RM-Distiller systematically exploits the refinement, scoring, and generation capabilities of teacher LLMs for enhanced reward model distillation, a novel approach in the field.
Findings
Outperforms traditional distillation methods on RM benchmarks
Improves reinforcement learning-based alignment results
Demonstrates the importance of multifaceted teacher capabilities
Abstract
Reward models (RMs) play a pivotal role in aligning large language models (LLMs) with human preferences. Due to the difficulty of obtaining high-quality human preference annotations, distilling preferences from generative LLMs has emerged as a standard practice. However, existing approaches predominantly treat teacher models as simple binary annotators, failing to fully exploit the rich knowledge and capabilities for RM distillation. To address this, we propose RM-Distiller, a framework designed to systematically exploit the multifaceted capabilities of teacher LLMs: (1) Refinement capability, which synthesizes highly correlated response pairs to create fine-grained and contrastive signals. (2) Scoring capability, which guides the RM in capturing precise preference strength via a margin-aware optimization objective. (3) Generation capability, which incorporates the teacher's generative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Recommender Systems and Techniques
