Finetuning-Free Personalization of Text to Image Generation via Hypernetworks
Sagar Shrestha, Gopal Sharma, Luowei Zhou, Suren Kumar

TL;DR
This paper introduces a fine-tuning-free method for personalizing text-to-image models using hypernetworks that predict adapted weights directly from subject images, eliminating the need for costly per-subject training.
Contribution
It presents a novel end-to-end training approach for hypernetworks with regularization, enabling reliable personalization without fine-tuning at inference, and introduces Hybrid-Model Classifier-Free Guidance for better compositional generalization.
Findings
Achieves strong personalization performance on multiple datasets.
Removes the need for per-subject optimization during inference.
Enhances compositional generalization with HM-CFG during sampling.
Abstract
Personalizing text-to-image diffusion models has traditionally relied on subject-specific fine-tuning approaches such as DreamBooth~\cite{ruiz2023dreambooth}, which are computationally expensive and slow at inference. Recent adapter- and encoder-based methods attempt to reduce this overhead but still depend on additional fine-tuning or large backbone models for satisfactory results. In this work, we revisit an orthogonal direction: fine-tuning-free personalization via Hypernetworks that predict LoRA-adapted weights directly from subject images. Prior hypernetwork-based approaches, however, suffer from costly data generation or unstable attempts to mimic base model optimization trajectories. We address these limitations with an end-to-end training objective, stabilized by a simple output regularization, yielding reliable and effective hypernetworks. Our method removes the need for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Face recognition and analysis
