One Size, Many Fits: Aligning Diverse Group-Wise Click Preferences in Large-Scale Advertising Image Generation

Shuo Lu; Haohan Wang; Wei Feng; Weizhen Wang; Shen Zhang; Yaoyu Li; Ao Ma; Zheng Zhang; Jingjing Lv; Junjie Shen; Ching Law; Bing Zhan; Yuan Xu; Huizai Yao; Yongcan Yu; Chenyang Si; and Jian Liang

arXiv:2602.02033·cs.CV·February 4, 2026

One Size, Many Fits: Aligning Diverse Group-Wise Click Preferences in Large-Scale Advertising Image Generation

Shuo Lu, Haohan Wang, Wei Feng, Weizhen Wang, Shen Zhang, Yaoyu Li, Ao Ma, Zheng Zhang, Jingjing Lv, Junjie Shen, Ching Law, Bing Zhan, Yuan Xu, Huizai Yao, Yongcan Yu, Chenyang Si, and Jian Liang

PDF

Open Access

TL;DR

This paper introduces a unified framework called OSMF that aligns diverse user group preferences in advertising image generation, improving targeted marketing effectiveness through adaptive grouping, preference-conditioned generation, and preference alignment.

Contribution

The paper presents a novel framework combining adaptive user grouping, a preference-aware multimodal model, and a new dataset for large-scale group-wise preference alignment in advertising images.

Findings

01

Achieves state-of-the-art offline performance

02

Improves online CTR for targeted groups

03

Introduces the first large-scale group preference dataset

Abstract

Advertising image generation has increasingly focused on online metrics like Click-Through Rate (CTR), yet existing approaches adopt a ``one-size-fits-all" strategy that optimizes for overall CTR while neglecting preference diversity among user groups. This leads to suboptimal performance for specific groups, limiting targeted marketing effectiveness. To bridge this gap, we present \textit{One Size, Many Fits} (OSMF), a unified framework that aligns diverse group-wise click preferences in large-scale advertising image generation. OSMF begins with product-aware adaptive grouping, which dynamically organizes users based on their attributes and product characteristics, representing each group with rich collective preference features. Building on these groups, preference-conditioned image generation employs a Group-aware Multimodal Large Language Model (G-MLLM) to generate tailored images…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Mobile Crowdsensing and Crowdsourcing · Generative Adversarial Networks and Image Synthesis