Clustering-based Imputation for Dropout Buyers in Large-scale Online   Experimentation

Sumin Shen; Huiying Mao; Zezhong Zhang; Zili Chen; Keyu Nie; Xinwei; Deng

arXiv:2209.06125·cs.LG·April 10, 2023

Clustering-based Imputation for Dropout Buyers in Large-scale Online Experimentation

Sumin Shen, Huiying Mao, Zezhong Zhang, Zili Chen, Keyu Nie, Xinwei, Deng

PDF

Open Access

TL;DR

This paper introduces a clustering-based imputation method for handling incomplete purchase metrics in large-scale online experiments, improving data quality for decision-making.

Contribution

It proposes a novel imputation approach that combines stratification and clustering, specifically addressing dropout buyers with user-specific data.

Findings

01

The method outperforms traditional imputation techniques in simulations.

02

It effectively handles large-scale online experiment data.

03

Application at eBay demonstrates practical benefits.

Abstract

In online experimentation, appropriate metrics (e.g., purchase) provide strong evidence to support hypotheses and enhance the decision-making process. However, incomplete metrics are frequently occurred in the online experimentation, making the available data to be much fewer than the planned online experiments (e.g., A/B testing). In this work, we introduce the concept of dropout buyers and categorize users with incomplete metric values into two groups: visitors and dropout buyers. For the analysis of incomplete metrics, we propose a clustering-based imputation method using $k$ -nearest neighbors. Our proposed imputation method considers both the experiment-specific features and users' activities along their shopping paths, allowing different imputation values for different users. To facilitate efficient imputation of large-scale data sets in online experimentation, the proposed method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Mobile Crowdsensing and Crowdsourcing

MethodsDropout