RAD-DPO: Robust Adaptive Denoising Direct Preference Optimization for Generative Retrieval in E-commerce

Zhiguo Chen; Guohao Sun; Yiming Qiu; Xingzhi Yao; Mingming Li; Huimu Wang; Yangqi Zhang; Songlin Wang; Sulong Xu

arXiv:2602.23964·cs.IR·April 29, 2026

RAD-DPO: Robust Adaptive Denoising Direct Preference Optimization for Generative Retrieval in E-commerce

Zhiguo Chen, Guohao Sun, Yiming Qiu, Xingzhi Yao, Mingming Li, Huimu Wang, Yangqi Zhang, Songlin Wang, Sulong Xu

PDF

TL;DR

RAD-DPO enhances generative retrieval in e-commerce by addressing key limitations of existing methods, improving alignment with user preferences, and demonstrating significant online and offline performance gains.

Contribution

It introduces a novel robust adaptive DPO method with token-level gradient detachment, similarity-based reward weighting, and multi-label contrastive objectives for better structured preference modeling.

Findings

01

Significant improvements in retrieval precision.

02

Enhanced robustness to noisy feedback.

03

Proven effectiveness in large-scale online deployment.

Abstract

Generative Retrieval (GR) is rapidly transforming e-commerce search by replacing traditional multi-stage pipelines with the autoregressive decoding of structured Semantic IDs (SIDs). Despite this architectural efficiency, aligning GR models with nuanced, real-world user preferences remains a critical challenge. While Direct Preference Optimization (DPO) offers an efficient alignment solution, its direct application to structured SIDs suffers from three limitations: (i) it penalizes shared hierarchical prefixes, causing gradient conflicts; (ii) it is vulnerable to noisy pseudo-negatives from implicit feedback; and (iii) in multi-label queries with multiple relevant items, it exacerbates a probability "squeezing effect" among valid candidates. To address these issues, we propose RAD-DPO, which introduces token-level gradient detachment to protect prefix structures, similarity-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.