Enhancing Dynamic Image Advertising with Vision-Language Pre-training
Zhoufutu Wen, Xinyu Zhao, Zhipeng Jin, Yi Yang, Wei Jia, Xiaodong, Chen, Shuanglong Li, Lin Liu

TL;DR
This paper introduces a vision-language pre-training framework for dynamic image advertising that improves query-image matching by leveraging large-scale image-text data and multi-objective fine-tuning, leading to better ad relevance and user engagement.
Contribution
It proposes a novel two-stage vision-language framework that unifies retrieval and relevance modeling for dynamic image advertising, enhancing performance over existing methods.
Findings
Improved CPM by 1.04% in online tests.
Enhanced CTR by 1.865% in real-world deployment.
Unified model outperforms separate optimization approaches.
Abstract
In the multimedia era, image is an effective medium in search advertising. Dynamic Image Advertising (DIA), a system that matches queries with ad images and generates multimodal ads, is introduced to improve user experience and ad revenue. The core of DIA is a query-image matching module performing ad image retrieval and relevance modeling. Current query-image matching suffers from limited and inconsistent data, and insufficient cross-modal interaction. Also, the separate optimization of retrieval and relevance models affects overall performance. To address this issue, we propose a vision-language framework consisting of two parts. First, we train a base model on large-scale image-text pairs to learn general multimodal representation. Then, we fine-tune the base model on advertising business data, unifying relevance modeling and retrieval through multi-objective learning. Our framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications
MethodsBalanced Selection
