RecFlow: An Industrial Full Flow Recommendation Dataset
Qi Liu,Kai Zheng, Rui Huang, Wuchao Li, Kuo Cai, Yuan Chai, Yanan Niu,, Yiqun Hui, Bing Han, Na Mou, Hongning Wang, Wentian Bao, Yunen Yu, Guorui, Zhou, Han Li, Yang Song, Defu Lian, Kun Gai

TL;DR
RecFlow is a comprehensive industrial recommendation dataset that includes both exposed and unexposed items across multiple stages, enabling more realistic and effective algorithm development for real-world systems.
Contribution
The paper introduces RecFlow, the first dataset to include unexposed items at all stages of the recommendation pipeline, bridging the gap between offline benchmarks and online industrial environments.
Findings
Algorithms trained on RecFlow show improved online performance.
Stage-specific sampling enhances recommendation effectiveness.
Deployed algorithms demonstrate significant online gains.
Abstract
Industrial recommendation systems (RS) rely on the multi-stage pipeline to balance effectiveness and efficiency when delivering items from a vast corpus to users. Existing RS benchmark datasets primarily focus on the exposure space, where novel RS algorithms are trained and evaluated. However, when these algorithms transition to real world industrial RS, they face a critical challenge of handling unexposed items which are a significantly larger space than the exposed one. This discrepancy profoundly impacts their practical performance. Additionally, these algorithms often overlook the intricate interplay between multiple RS stages, resulting in suboptimal overall system performance. To address this issue, we introduce RecFlow, an industrial full flow recommendation dataset designed to bridge the gap between offline RS benchmarks and the real online environment. Unlike existing datasets,…
Peer Reviews
Decision·ICLR 2025 Poster
1. The proposed full-flow dataset provides a strong groundwork for follow-up research. For example, models can learn how to alleviate selection bias due to the discrepancy between the training and inference stages. 2. The authors performed comprehensive experiments and presented the results of the experiments with means and variances. 3. The complete datasets are available for further research.
1. The paper's current presentation lacks clarity and coherence, making it difficult to follow. Additionally, there are numerous minor grammatical and structural errors throughout the text. 2. While the initial explosion stage involves large-scale data, the subsequent re-ranking and edge-ranking stages utilize significantly smaller datasets. This inconsistency undermines the paper's claim of working with large-scale industrial data. 3. The paper's novelty is not effectively demonstrated through
1. An essential and practical problem in real industry recommendation. The full-stage recommendation is widespread in the industry; this dataset really provides a new perspective on this problem. 2. The collection strategy is provided, and privacy protection is carefully considered. 3. Experiments are provided to show how to use this dataset.
1. Despite providing collection and analysis, the collection procedure should be provided in more detail to show that it is reasonable and correct. Moreover, the analysis is too simple, and more intuition about this dataset can be given. 2. The experiments provided to show how to use this dataset are interesting. However, in line 079, the author argues that Recflow can provide merits of ten tasks. It should be supposed that the experiments on these tasks should be provided. 3. There are some t
1. The paper presents the first comprehensive large-scale dataset that captures the complete recommendation pipeline, filling a critical gap in the field where existing datasets only contain exposure data. It could enable further research into real-world problems that were previously difficult to study, eg: distribution shift, stage interaction effects. 2. Good motivation is provided by clearly articulating the limitations of existing datasets and the importance of studying full recommendation p
1. The paper doesn't adequately address the computational challenges of working with such a large dataset. Details about storage requirements and recommended sampling strategies would be valuable for practitioners. 2. The multi-task learning potential of the dataset is mentioned but not thoroughly explored. Given the rich set of user feedback signals, this seems like a missed opportunity. 3. While the authors mention online A/B testing validation, the details are sparse. More information about t
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques
MethodsFocus
