TL;DR
Premier introduces a personalized image generation framework that uses learnable user embeddings and a preference adapter to better capture and modulate individual user preferences in text-to-image synthesis.
Contribution
It proposes a novel preference modulation method with a dispersion loss for distinct user embeddings, improving personalization accuracy and generalization with limited data.
Findings
Premier outperforms prior methods in preference alignment.
It achieves higher scores on text consistency and ViPer proxy metrics.
Expert evaluations favor Premier's personalized outputs.
Abstract
Text-to-image generation has advanced rapidly, yet it still struggles to capture the nuanced user preferences. Existing approaches typically rely on multimodal large language models to infer user preferences, but the derived prompts or latent codes rarely reflect them faithfully, leading to suboptimal personalization. We present Premier, a novel preference modulation framework for personalized image generation. Premier represents each user's preference as a learnable embedding and introduces a preference adapter that fuses the user embedding with the text prompt. To enable accurate and fine-grained preference control, the fused preference embedding is further used to modulate the generative process. To enhance the distinctness of individual preference and improve alignment between outputs and user-specific styles, we incorporate a dispersion loss that enforces separation among user…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
