Efficient Online RFT with Plug-and-Play LLM Judges: Unlocking State-of-the-Art Performance

Rudransh Agnihotri; Ananya Pandey

arXiv:2506.05748·cs.LG·June 9, 2025

Efficient Online RFT with Plug-and-Play LLM Judges: Unlocking State-of-the-Art Performance

Rudransh Agnihotri, Ananya Pandey

PDF

Open Access

TL;DR

This paper introduces a cost-effective, plug-and-play LLM-based judge that replaces heavyweight models in RLHF, achieving state-of-the-art performance and high interpretability with minimal additional parameters.

Contribution

It presents a novel method using a frozen instruction-tuned 7B LLM with a tiny LoRA adapter as an effective, transparent reward model for RLHF, eliminating the offline tuning phase.

Findings

01

Achieves 96.2% accuracy on RewardBench, outperforming larger reward networks.

02

Enables a 7B actor to surpass a 70B baseline in GSM-8K accuracy.

03

LoRA judge attains 9/10 similarity to human explanations in GPT-4 scoring.

Abstract

Reward-model training is the cost bottleneck in modern Reinforcement Learning Human Feedback (RLHF) pipelines, often requiring tens of billions of parameters and an offline preference-tuning phase. In the proposed method, a frozen, instruction-tuned 7B LLM is augmented with only a one line JSON rubric and a rank-16 LoRA adapter (affecting just 0.8% of the model's parameters), enabling it to serve as a complete substitute for the previously used heavyweight evaluation models. The plug-and-play judge achieves 96.2% accuracy on RewardBench, outperforming specialized reward networks ranging from 27B to 70B parameters. Additionally, it allows a 7B actor to outperform the top 70B DPO baseline, which scores 61.8%, by achieving 92% exact match accuracy on GSM-8K utilizing online PPO. Thorough ablations indicate that (i) six in context demonstrations deliver the majority of the zero-to-few-shot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Emotion and Mood Recognition · Adversarial Robustness in Machine Learning