GRPO-RM: Fine-Tuning Representation Models via GRPO-Driven Reinforcement Learning

Yanchen Xu; Ziheng Jiao; Hongyuan Zhang; Xuelong Li

arXiv:2511.15256·cs.LG·November 20, 2025

GRPO-RM: Fine-Tuning Representation Models via GRPO-Driven Reinforcement Learning

Yanchen Xu, Ziheng Jiao, Hongyuan Zhang, Xuelong Li

PDF

Open Access

TL;DR

This paper introduces GRPO-RM, a reinforcement learning approach for fine-tuning representation models, adapting the GRPO method from language models to improve their performance on real-world datasets.

Contribution

We extend the GRPO reinforcement learning technique to representation models, proposing a new output grouping and reward function for effective post-training optimization.

Findings

01

GRPO-RM improves representation model performance on multiple datasets.

02

The method effectively replaces token sampling with output grouping.

03

Experimental results validate the approach's effectiveness.

Abstract

The Group Relative Policy Optimization (GRPO), a reinforcement learning method used to fine-tune large language models (LLMs), has proved its effectiveness in practical applications such as DeepSeek-R1. It raises a question whether GRPO can be generalized to representation learning models. In this paper, we propose Group Relative Policy Optimization for Representation Model (GRPO-RM), and investigate the performance of GRPO-like policy in post-training representation models. Specifically, our method establishes a predefined output set to functionally replace token sequence sampling in LLMs, thereby generating an output group, which is essential for the probability-driven optimization of GRPO. In addition, a specialized reward function is designed to accommodate the properties of representation models. Extensive experiments are conducted on various real-world datasets to validate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics