Spend Search Where It Pays: Value-Guided Structured Sampling and Optimization for Generative Recommendation

Jie Jiang; Yangru Huang; Zeyu Wang; Changping Wang; Yuling Xiong; Jun Zhang; Huan Yu

arXiv:2602.10699·cs.AI·February 13, 2026

Spend Search Where It Pays: Value-Guided Structured Sampling and Optimization for Generative Recommendation

Jie Jiang, Yangru Huang, Zeyu Wang, Changping Wang, Yuling Xiong, Jun Zhang, Huan Yu

PDF

Open Access

TL;DR

V-STAR introduces a value-guided sampling framework for autoregressive generative recommendation, enhancing exploration and learning efficiency by addressing probability-reward mismatches and bias issues in decoding.

Contribution

It presents V-STAR, a novel framework combining value-guided decoding and sibling-relative advantage computation to improve generative recommendation performance.

Findings

01

Outperforms state-of-the-art baselines in accuracy and diversity.

02

Enhances exploration efficiency without exhaustive search.

03

Demonstrates effectiveness on offline and online datasets.

Abstract

Generative recommendation via autoregressive models has unified retrieval and ranking into a single conditional generation framework. However, fine-tuning these models with Reinforcement Learning (RL) often suffers from a fundamental probability-reward mismatch. Conventional likelihood-dominated decoding (e.g., beam search) exhibits a myopic bias toward locally probable prefixes, which causes two critical failures: (1) insufficient exploration, where high-reward items in low-probability branches are prematurely pruned and rarely sampled, and (2) advantage compression, where trajectories sharing high-probability prefixes receive highly correlated rewards with low within-group variance, yielding a weak comparative signal for RL. To address these challenges, we propose V-STAR, a Value-guided Sampling and Tree-structured Advantage Reinforcement framework. V-STAR forms a self-evolving loop…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Topic Modeling · Domain Adaptation and Few-Shot Learning