Enhancing Dynamic Image Advertising with Vision-Language Pre-training

Zhoufutu Wen; Xinyu Zhao; Zhipeng Jin; Yi Yang; Wei Jia; Xiaodong; Chen; Shuanglong Li; Lin Liu

arXiv:2306.14112·cs.IR·June 27, 2023

Enhancing Dynamic Image Advertising with Vision-Language Pre-training

Zhoufutu Wen, Xinyu Zhao, Zhipeng Jin, Yi Yang, Wei Jia, Xiaodong, Chen, Shuanglong Li, Lin Liu

PDF

Open Access

TL;DR

This paper introduces a vision-language pre-training framework for dynamic image advertising that improves query-image matching by leveraging large-scale image-text data and multi-objective fine-tuning, leading to better ad relevance and user engagement.

Contribution

It proposes a novel two-stage vision-language framework that unifies retrieval and relevance modeling for dynamic image advertising, enhancing performance over existing methods.

Findings

01

Improved CPM by 1.04% in online tests.

02

Enhanced CTR by 1.865% in real-world deployment.

03

Unified model outperforms separate optimization approaches.

Abstract

In the multimedia era, image is an effective medium in search advertising. Dynamic Image Advertising (DIA), a system that matches queries with ad images and generates multimodal ads, is introduced to improve user experience and ad revenue. The core of DIA is a query-image matching module performing ad image retrieval and relevance modeling. Current query-image matching suffers from limited and inconsistent data, and insufficient cross-modal interaction. Also, the separate optimization of retrieval and relevance models affects overall performance. To address this issue, we propose a vision-language framework consisting of two parts. First, we train a base model on large-scale image-text pairs to learn general multimodal representation. Then, we fine-tune the base model on advertising business data, unifying relevance modeling and retrieval through multi-objective learning. Our framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications

MethodsBalanced Selection