Scaling Transformers for Discriminative Recommendation via Generative Pretraining

Chunqi Wang; Bingchao Wu; Zheng Chen; Lei Shen; Bing Wang; Xiaoyi Zeng

arXiv:2506.03699·cs.IR·August 12, 2025

Scaling Transformers for Discriminative Recommendation via Generative Pretraining

Chunqi Wang, Bingchao Wu, Zheng Chen, Lei Shen, Bing Wang, Xiaoyi Zeng

PDF

TL;DR

This paper introduces GPSD, a framework that uses generative pretraining and sparse parameter freezing to improve the scalability and generalization of large discriminative recommendation models, effectively addressing overfitting issues.

Contribution

GPSD is a novel framework that leverages generative pretraining to initialize discriminative models and employs sparse parameter freezing, enhancing scalability and reducing overfitting in recommendation systems.

Findings

01

GPSD significantly narrows the generalization gap in model training.

02

It achieves consistent performance improvements as model size scales from 13K to 0.3B parameters.

03

GPSD delivers superior online A/B test results in industrial settings.

Abstract

Discriminative recommendation tasks, such as CTR (click-through rate) and CVR (conversion rate) prediction, play critical roles in the ranking stage of large-scale industrial recommender systems. However, training a discriminative model encounters a significant overfitting issue induced by data sparsity. Moreover, this overfitting issue worsens with larger models, causing them to underperform smaller ones. To address the overfitting issue and enhance model scalability, we propose a framework named GPSD (\textbf{G}enerative \textbf{P}retraining for \textbf{S}calable \textbf{D}iscriminative Recommendation), drawing inspiration from generative training, which exhibits no evident signs of overfitting. GPSD leverages the parameters learned from a pretrained generative model to initialize a discriminative model, and subsequently applies a sparse parameter freezing strategy. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.