Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards

Xiaobao Wu

arXiv:2505.02686·cs.CL·June 13, 2025

Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards

Xiaobao Wu

PDF

Open Access 1 Repo

TL;DR

This survey reviews recent advances in reward-based learning for large language models, covering reward models, strategies, benchmarks, applications, challenges, and future directions in the field.

Contribution

It provides a comprehensive overview of reward learning techniques, models, and applications in LLMs, highlighting recent developments and future challenges.

Findings

01

Reward models are crucial for aligning LLM behavior.

02

Various learning strategies like RLHF and RLAIF are discussed.

03

Benchmark datasets and applications are summarized.

Abstract

Recent developments in Large Language Models (LLMs) have shifted from pre-training scaling to post-training and test-time scaling. Across these developments, a key unified paradigm has arisen: Learning from Rewards, where reward signals act as the guiding stars to steer LLM behavior. It has underpinned a wide range of prevalent techniques, such as reinforcement learning (RLHF, RLAIF, DPO, and GRPO), reward-guided decoding, and post-hoc correction. Crucially, this paradigm enables the transition from passive learning from static data to active learning from dynamic feedback. This endows LLMs with aligned preferences and deep reasoning capabilities for diverse tasks. In this survey, we present a comprehensive overview of learning from rewards, from the perspective of reward models and learning strategies across training, inference, and post-inference stages. We further discuss the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bobxwu/learning-from-rewards-llm-papers
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning and Algorithms

MethodsDirect Preference Optimization