CTR-Driven Advertising Image Generation with Multimodal Large Language Models
Xingye Chen, Wei Feng, Zhenbang Du, Weizhen Wang, Yanyin Chen, Haohan, Wang, Linkai Liu, Yaoyu Li, Jinyuan Zhao, Yu Li, Zheng Zhang, Jingjing Lv,, Junjie Shen, Zhangang Lin, Jingping Shao, Yuanjie Shao, Xinge You, Changxin, Gao, Nong Sang

TL;DR
This paper introduces a novel approach using Multimodal Large Language Models optimized with reinforcement learning to generate advertising images that maximize click-through rates, outperforming existing methods in relevance and effectiveness.
Contribution
The authors develop a CTR-optimized image generation framework with a new reward model and product-centric strategy, advancing the use of MLLMs in advertising.
Findings
Achieves state-of-the-art CTR performance in online and offline tests.
Effectively aligns generated images with product features.
Demonstrates the benefit of reinforcement learning in multimodal image generation.
Abstract
In web data, advertising images are crucial for capturing user attention and improving advertising effectiveness. Most existing methods generate background for products primarily focus on the aesthetic quality, which may fail to achieve satisfactory online performance. To address this limitation, we explore the use of Multimodal Large Language Models (MLLMs) for generating advertising images by optimizing for Click-Through Rate (CTR) as the primary objective. Firstly, we build targeted pre-training tasks, and leverage a large-scale e-commerce multimodal dataset to equip MLLMs with initial capabilities for advertising image generation tasks. To further improve the CTR of generated images, we propose a novel reward model to fine-tune pre-trained MLLMs through Reinforcement Learning (RL), which can jointly utilize multimodal features and accurately reflect user click preferences.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsSoftmax · Attention Is All You Need · Focus
