Break the Inaccessible Boundary: Distilling Post-Conversion Content for User Retention Modeling
Tianbao Ma, Ruochen Yang, Chengen Li, Yuexin Shi, Jiangxia Cao, Linxun Chen, Zhaojie Liu, Yanan Niu, Han Li, Kun Gai

TL;DR
This paper introduces OCARM, a two-stage distillation framework that enhances user retention prediction by implicitly capturing future onboarding content signals without feature leakage.
Contribution
The paper proposes a novel distillation-based method to incorporate future onboarding content into retention models while avoiding feature leakage during training.
Findings
Improved retention prediction accuracy in offline experiments.
Consistent online A/B test improvements in real-world scenarios.
Abstract
User retention is a key metric to measure long-term engagement in modern platforms. In real-time bidding (RTB) advertising system for user re-engagement, the retention model is required to predict future revisit probability at bidding time, before the user converts and consumes any content. Although post-conversion content, termed Onboarding Content, provides highly informative signals for retention prediction, directly using it in training causes severe feature leakage and creates a gap between training and serving. To address this issue, we propose OCARM, a two-stage distillation-aligned framework for Onboarding Content Augmented Retention Modeling, enabling the model to implicitly capture future content using only observable features during inference. In the first stage, we deliberately expose onboarding content to train a hierarchical encoder that produces teacher representations.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
