Retrieval-GRPO: A Multi-Objective Reinforcement Learning Framework for Dense Retrieval in Taobao Search

Xingxian Liu; Dongshuai Li; Jiahui Wan; Tao Wen; Gui Ling; Yuliang Yan; Fuyu Lv; Dan Ou; Haihong Tang; Bo Zheng

arXiv:2511.13885·cs.IR·February 10, 2026

Retrieval-GRPO: A Multi-Objective Reinforcement Learning Framework for Dense Retrieval in Taobao Search

Xingxian Liu, Dongshuai Li, Jiahui Wan, Tao Wen, Gui Ling, Yuliang Yan, Fuyu Lv, Dan Ou, Haihong Tang, Bo Zheng

PDF

Open Access

TL;DR

Retrieval-GRPO introduces a reinforcement learning framework for dense e-commerce search retrieval that dynamically generates training samples and optimizes multiple objectives, improving semantic accuracy and efficiency.

Contribution

It presents a novel multi-objective reinforcement learning approach that replaces offline hard negative sampling with real-time candidate retrieval and integrates LLM-based relevance feedback.

Findings

01

Enhanced semantic generalization for long-tail queries

02

Eliminated reliance on offline hard negatives

03

Improved online retrieval performance

Abstract

Dense retrieval, as the core component of e-commerce search engines, maps user queries and items into a unified semantic space through pre-trained embedding models to enable large-scale real-time semantic retrieval. Despite the rapid advancement of LLMs gradually replacing traditional BERT architectures for embedding, their training paradigms still adhere to BERT-like supervised fine-tuning and hard negative mining strategies. This approach relies on complex offline hard negative sample construction pipelines, which constrain model iteration efficiency and hinder the evolutionary potential of semantic representation capabilities. Besides, existing multi-task learning frameworks face the seesaw effect when simultaneously optimizing semantic relevance and non-relevance objectives. In this paper, we propose Retrieval-GRPO, a multi-objective reinforcement learning-based dense retrieval…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Recommender Systems and Techniques · Advanced Image and Video Retrieval Techniques