No One Left Behind: How to Exploit the Incomplete and Skewed Multi-Label Data for Conversion Rate Prediction

Qinglin Jia; Zhaocheng Du; Chuhan Wu; Huifeng Guo; Ruiming Tang; Shuting Shi; Muyu Zhang

arXiv:2512.13300·cs.LG·December 16, 2025

No One Left Behind: How to Exploit the Incomplete and Skewed Multi-Label Data for Conversion Rate Prediction

Qinglin Jia, Zhaocheng Du, Chuhan Wu, Huifeng Guo, Ruiming Tang, Shuting Shi, Muyu Zhang

PDF

Open Access

TL;DR

This paper introduces KAML, a novel framework that effectively trains multi-task models on incomplete, skewed multi-label data for conversion rate prediction, improving performance in real-world advertising scenarios.

Contribution

The paper proposes KAML, a fine-grained knowledge transfer framework with attribution-driven masking, hierarchical knowledge extraction, and ranking loss to handle incomplete and skewed multi-label data.

Findings

01

Significant performance gains over existing MTL baselines.

02

Effective handling of incomplete multi-label data in industry datasets.

03

Successful online A/B test results demonstrating practical benefits.

Abstract

In most real-world online advertising systems, advertisers typically have diverse customer acquisition goals. A common solution is to use multi-task learning (MTL) to train a unified model on post-click data to estimate the conversion rate (CVR) for these diverse targets. In practice, CVR prediction often encounters missing conversion data as many advertisers submit only a subset of user conversion actions due to privacy or other constraints, making the labels of multi-task data incomplete. If the model is trained on all available samples where advertisers submit user conversion actions, it may struggle when deployed to serve a subset of advertisers targeting specific conversion actions, as the training and deployment data distributions are mismatched. While considerable MTL efforts have been made, a long-standing challenge is how to effectively train a unified model with the incomplete…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Recommender Systems and Techniques · Spam and Phishing Detection