Enhancing Taobao Display Advertising with Multimodal Representations: Challenges, Approaches and Insights
Xiang-Rong Sheng, Feifan Yang, Litong Gong, Biao Wang, Zhangming Chan,, Yujing Zhang, Yueyao Cheng, Yong-Nan Zhu, Tiezheng Ge, Han Zhu, Yuning Jiang,, Jian Xu, Bo Zheng

TL;DR
This paper presents a two-phase framework for integrating multimodal data into Taobao's display advertising system, addressing challenges of effectiveness and cost, leading to significant performance improvements.
Contribution
It introduces a novel two-phase approach combining multimodal pre-training and integration with ID-based models for industrial recommendation systems.
Findings
Significant performance improvements in Taobao advertising system
Effective and cost-efficient multimodal data integration method
Insights for practitioners on leveraging multimodal data
Abstract
Despite the recognized potential of multimodal data to improve model accuracy, many large-scale industrial recommendation systems, including Taobao display advertising system, predominantly depend on sparse ID features in their models. In this work, we explore approaches to leverage multimodal data to enhance the recommendation accuracy. We start from identifying the key challenges in adopting multimodal data in a manner that is both effective and cost-efficient for industrial systems. To address these challenges, we introduce a two-phase framework, including: 1) the pre-training of multimodal representations to capture semantic similarity, and 2) the integration of these representations with existing ID-based models. Furthermore, we detail the architecture of our production system, which is designed to facilitate the deployment of multimodal representations. Since the integration of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Communication and Language · Subtitles and Audiovisual Media · Multimedia Communication and Technology
