AntM$^{2}$C: A Large Scale Dataset For Multi-Scenario Multi-Modal CTR Prediction
Zhaoxin Huan, Ke Ding, Ang Li, Xiaolu Zhang, Xu Min, Yong He, Liang, Zhang, Jun Zhou, Linjian Mo, Jinjie Gu, Zhongyi Liu, Wenliang Zhong, Guannan, Zhang

TL;DR
AntM$^{2}$C is the largest multi-scenario, multi-modal CTR dataset derived from industrial data, enabling comprehensive evaluation of recommendation models across diverse item types and features.
Contribution
The paper introduces AntM$^{2}$C, a large-scale, multi-scenario, multi-modal CTR dataset with 1 billion data points, covering multiple item types and including multi-modal features, filling significant gaps in existing datasets.
Findings
AntM$^{2}$C includes 5 item types, offering diverse user preference insights.
The dataset incorporates multi-modal features like text and images, enhancing modeling capabilities.
It is the largest-scale CTR dataset with 1 billion entries, enabling more reliable evaluation.
Abstract
Click-through rate (CTR) prediction is a crucial issue in recommendation systems. There has been an emergence of various public CTR datasets. However, existing datasets primarily suffer from the following limitations. Firstly, users generally click different types of items from multiple scenarios, and modeling from multiple scenarios can provide a more comprehensive understanding of users. Existing datasets only include data for the same type of items from a single scenario. Secondly, multi-modal features are essential in multi-scenario prediction as they address the issue of inconsistent ID encoding between different scenarios. The existing datasets are based on ID features and lack multi-modal features. Third, a large-scale dataset can provide a more reliable evaluation of models, fully reflecting the performance differences between models. The scale of existing datasets is around 100…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Sentiment Analysis and Opinion Mining · Image Retrieval and Classification Techniques
