Fast-Slow Thinking RM: Efficient Integration of Scalar and Generative Reward Models

Jiayun Wu; Peixu Hou; Shan Qu; Peng Zhang; Ning Gu; Tun Lu

arXiv:2603.20212·cs.CL·March 24, 2026

Fast-Slow Thinking RM: Efficient Integration of Scalar and Generative Reward Models

Jiayun Wu, Peixu Hou, Shan Qu, Peng Zhang, Ning Gu, Tun Lu

PDF

Open Access

TL;DR

This paper introduces Fast-Slow Thinking Reward Models (F/S-RM), a hybrid approach that combines efficient scalar reward estimation with accurate generative reasoning, improving performance and reducing computational costs in LLM alignment.

Contribution

It proposes a novel hybrid reward model architecture that integrates scalar and generative reward paradigms within a single model, inspired by Dual Process Theory.

Findings

01

Achieves 1.2% performance improvement over state-of-the-art models

02

Reduces token consumption by 20.8%

03

Demonstrates effective integration of fast and slow thinking in reward modeling

Abstract

Reward models (RMs) are critical for aligning Large Language Models via Reinforcement Learning from Human Feedback (RLHF). While Generative Reward Models (GRMs) achieve superior accuracy through chain-of-thought (CoT) reasoning, they incur substantial computational costs. Conversely, Scalar Reward Models (SRMs) offer efficiency but suffer from limited performance and adaptability in complex scenarios. We introduce Fast-Slow Thinking Reward Models (F/S-RM), a hybrid RM architecture inspired by Dual Process Theory. It trains a single model to integrate two distinct reward paradigms: first-token prediction as a scalar score (fast thinking) and CoT-based judgment (slow thinking), regulated by a dual-confidence activation mechanism that determines when to activate slow thinking. F/S-RM achieves a 1.2% relative performance improvement over state-of-the-art models while reducing token…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Explainable Artificial Intelligence (XAI) · Topic Modeling