The Effects of Data Split Strategies on the Offline Experiments for CTR   Prediction

Ramazan Tarik Turksoy; Beyza Turkmen

arXiv:2406.18320·cs.IR·June 27, 2024

The Effects of Data Split Strategies on the Offline Experiments for CTR Prediction

Ramazan Tarik Turksoy, Beyza Turkmen

PDF

Open Access

TL;DR

This paper investigates how different data split strategies, especially temporal splits, affect the accuracy of offline CTR prediction evaluations, highlighting the importance of realistic data partitioning for model assessment.

Contribution

It systematically compares random and temporal data splits in offline CTR prediction evaluation, emphasizing the significance of realistic data partitioning strategies.

Findings

01

Temporal splits better reflect real-world scenarios.

02

Data split strategy significantly impacts offline evaluation results.

03

Random splits may overestimate model performance.

Abstract

Click-through rate (CTR) prediction is a crucial task in online advertising to recommend products that users are likely to be interested in. To identify the best-performing models, rigorous model evaluation is necessary. Offline experimentation plays a significant role in selecting models for live user-item interactions, despite the value of online experimentation like A/B testing, which has its own limitations and risks. Often, the correlation between offline performance metrics and actual online model performance is inadequate. One main reason for this discrepancy is the common practice of using random splits to create training, validation, and test datasets in CTR prediction. In contrast, real-world CTR prediction follows a temporal order. Therefore, the methodology used in offline evaluation, particularly the data splitting strategy, is crucial. This study aims to address the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced X-ray and CT Imaging · Fault Detection and Control Systems