No One Left Behind: How to Exploit the Incomplete and Skewed Multi-Label Data for Conversion Rate Prediction
Qinglin Jia, Zhaocheng Du, Chuhan Wu, Huifeng Guo, Ruiming Tang, Shuting Shi, Muyu Zhang

TL;DR
This paper introduces KAML, a novel framework that effectively trains multi-task models on incomplete, skewed multi-label data for conversion rate prediction, improving performance in real-world advertising scenarios.
Contribution
The paper proposes KAML, a fine-grained knowledge transfer framework with attribution-driven masking, hierarchical knowledge extraction, and ranking loss to handle incomplete and skewed multi-label data.
Findings
Significant performance gains over existing MTL baselines.
Effective handling of incomplete multi-label data in industry datasets.
Successful online A/B test results demonstrating practical benefits.
Abstract
In most real-world online advertising systems, advertisers typically have diverse customer acquisition goals. A common solution is to use multi-task learning (MTL) to train a unified model on post-click data to estimate the conversion rate (CVR) for these diverse targets. In practice, CVR prediction often encounters missing conversion data as many advertisers submit only a subset of user conversion actions due to privacy or other constraints, making the labels of multi-task data incomplete. If the model is trained on all available samples where advertisers submit user conversion actions, it may struggle when deployed to serve a subset of advertisers targeting specific conversion actions, as the training and deployment data distributions are mismatched. While considerable MTL efforts have been made, a long-standing challenge is how to effectively train a unified model with the incomplete…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Recommender Systems and Techniques · Spam and Phishing Detection
