GFRIEND: Generative Few-shot Reward Inference through EfficieNt DPO

Yiyang Zhao; Huiyu Bai; Xuejiao Zhao

arXiv:2506.08965·cs.LG·June 11, 2025

GFRIEND: Generative Few-shot Reward Inference through EfficieNt DPO

Yiyang Zhao, Huiyu Bai, Xuejiao Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces GFRIEND, a novel framework that improves reward model training efficiency in RLHF by using data augmentation, preference refinement, and multi-level optimization, enabling high performance with limited data.

Contribution

The paper presents a new data augmentation and preference refinement framework that enhances reward model training with few-shot data, outperforming traditional methods like DPO.

Findings

01

Significant improvement in data efficiency for reward models.

02

Reward models trained with GFRIEND achieve performance comparable to large-scale datasets.

03

Enhanced preference understanding through Chain-of-Thought sampling and multi-level optimization.

Abstract

The ability to train high-performing reward models with few-shot data is critical for enhancing the efficiency and scalability of Reinforcement Learning from Human Feedback (RLHF). We propose a data augmentation and expansion framework that enables generative reward models trained on small datasets to achieve comparable performance to those trained on large-scale datasets. Traditional methods to train a generative reward model, such as Direct Preference Optimization (DPO), are constrained by inefficiencies in sample pairing and limited data diversity. This work introduces preference refinement, which employs Chain-of-Thought (CoT) sampling to uncover diverse and high-quality preference relationships. It also incorporates a perplexity-based scoring mechanism to assign nuanced preference levels and utilizes Multi-level Direct Preference Optimization (M-DPO) to enable the model to capture…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

snowteam2023/gfriend
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)