Personalized Image Generation with Large Multimodal Models
Yiyan Xu, Wenjie Wang, Yang Zhang, Biao Tang, Peng Yan, Fuli Feng,, Xiangnan He

TL;DR
This paper introduces Pigeon, a framework leveraging large multimodal models for personalized image generation, effectively capturing user preferences from noisy data and limited supervision, demonstrated through sticker and poster generation.
Contribution
The paper presents a novel personalized image generation framework with a two-stage preference alignment scheme and modules for capturing user preferences from noisy data.
Findings
Pigeon outperforms baseline models in quantitative metrics.
Human evaluations favor Pigeon's generated images.
Effective preference alignment reduces data noise impact.
Abstract
Personalized content filtering, such as recommender systems, has become a critical infrastructure to alleviate information overload. However, these systems merely filter existing content and are constrained by its limited diversity, making it difficult to meet users' varied content needs. To address this limitation, personalized content generation has emerged as a promising direction with broad applications. Nevertheless, most existing research focuses on personalized text generation, with relatively little attention given to personalized image generation. The limited work in personalized image generation faces challenges in accurately capturing users' visual preferences and needs from noisy user-interacted images and complex multimodal instructions. Worse still, there is a lack of supervised data for training personalized image generation models. To overcome the challenges, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
MethodsSoftmax · Attention Is All You Need · ALIGN
