ReCast: Recasting Learning Signals for Reinforcement Learning in Generative Recommendation
Peiyan Zhang, Hanmo Liu, Chengxuan Tong, Yuxia Wu, Wei Guo, Yong Liu

TL;DR
ReCast introduces a novel framework that improves reinforcement learning in generative recommendation by repairing and contrasting learning signals, leading to significant performance and system efficiency gains.
Contribution
It proposes a repair-then-contrast learning framework that enhances learnability and efficiency in sparse-hit generative recommendation RL tasks.
Findings
ReCast achieves up to 36.6% relative improvement in Pass@1.
ReCast reduces actor-side update time by 16.60x.
ReCast lowers peak memory usage by 16.5%.
Abstract
Generic group-based RL assumes that sampled rollout groups are already usable learning signals. We show that this assumption breaks down in sparse-hit generative recommendation, where many sampled groups never become learnable at all. We propose ReCast, a repair-then-contrast learning-signal framework that first restores minimal learnability for all-zero groups and then replaces full-group reward normalization with a boundary-focused contrastive update on the strongest positive and the hardest negative. ReCast leaves the outer RL framework unchanged, modifies only within-group signal construction, and partially decouples rollout search width from actor-side update width. Across multiple generative recommendation tasks, ReCast consistently outperforms OpenOneRec-RL, achieving up to 36.6% relative improvement in Pass@1. Its matched-budget advantage is substantially larger: ReCast reaches…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
