Clustering-based Imputation for Dropout Buyers in Large-scale Online Experimentation
Sumin Shen, Huiying Mao, Zezhong Zhang, Zili Chen, Keyu Nie, Xinwei, Deng

TL;DR
This paper introduces a clustering-based imputation method for handling incomplete purchase metrics in large-scale online experiments, improving data quality for decision-making.
Contribution
It proposes a novel imputation approach that combines stratification and clustering, specifically addressing dropout buyers with user-specific data.
Findings
The method outperforms traditional imputation techniques in simulations.
It effectively handles large-scale online experiment data.
Application at eBay demonstrates practical benefits.
Abstract
In online experimentation, appropriate metrics (e.g., purchase) provide strong evidence to support hypotheses and enhance the decision-making process. However, incomplete metrics are frequently occurred in the online experimentation, making the available data to be much fewer than the planned online experiments (e.g., A/B testing). In this work, we introduce the concept of dropout buyers and categorize users with incomplete metric values into two groups: visitors and dropout buyers. For the analysis of incomplete metrics, we propose a clustering-based imputation method using -nearest neighbors. Our proposed imputation method considers both the experiment-specific features and users' activities along their shopping paths, allowing different imputation values for different users. To facilitate efficient imputation of large-scale data sets in online experimentation, the proposed method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Mobile Crowdsensing and Crowdsourcing
MethodsDropout
