TL;DR
This paper introduces DrUM, a condition-level modeling approach using transformer-based adapters in text-to-image diffusion models to improve personalized image generation by effectively incorporating user preferences.
Contribution
The paper presents DrUM, a novel method that enables personalized generation through condition-level modeling in the latent space without additional fine-tuning of large models.
Findings
Strong performance on large-scale datasets
Compatible with open-source text encoders
Effective personalization without extra fine-tuning
Abstract
Personalized generation in T2I diffusion models aims to naturally incorporate individual user preferences into the generation process with minimal user intervention. However, existing studies primarily rely on prompt-level modeling with large-scale models, often leading to inaccurate personalization due to the limited input token capacity of T2I diffusion models. To address these limitations, we propose DrUM, a novel method that integrates user profiling with a transformer-based adapter to enable personalized generation through condition-level modeling in the latent space. DrUM demonstrates strong performance on large-scale datasets and seamlessly integrates with open-source text encoders, making it compatible with widely used foundation T2I models without requiring additional fine-tuning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
