Fast-Slow Thinking RM: Efficient Integration of Scalar and Generative Reward Models
Jiayun Wu, Peixu Hou, Shan Qu, Peng Zhang, Ning Gu, Tun Lu

TL;DR
This paper introduces Fast-Slow Thinking Reward Models (F/S-RM), a hybrid approach that combines efficient scalar reward estimation with accurate generative reasoning, improving performance and reducing computational costs in LLM alignment.
Contribution
It proposes a novel hybrid reward model architecture that integrates scalar and generative reward paradigms within a single model, inspired by Dual Process Theory.
Findings
Achieves 1.2% performance improvement over state-of-the-art models
Reduces token consumption by 20.8%
Demonstrates effective integration of fast and slow thinking in reward modeling
Abstract
Reward models (RMs) are critical for aligning Large Language Models via Reinforcement Learning from Human Feedback (RLHF). While Generative Reward Models (GRMs) achieve superior accuracy through chain-of-thought (CoT) reasoning, they incur substantial computational costs. Conversely, Scalar Reward Models (SRMs) offer efficiency but suffer from limited performance and adaptability in complex scenarios. We introduce Fast-Slow Thinking Reward Models (F/S-RM), a hybrid RM architecture inspired by Dual Process Theory. It trains a single model to integrate two distinct reward paradigms: first-token prediction as a scalar score (fast thinking) and CoT-based judgment (slow thinking), regulated by a dual-confidence activation mechanism that determines when to activate slow thinking. F/S-RM achieves a 1.2% relative performance improvement over state-of-the-art models while reducing token…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Explainable Artificial Intelligence (XAI) · Topic Modeling
