IMAGE-ALCHEMY: Advancing subject fidelity in personalised text-to-image generation
Amritanshu Tiwari, Cherish Puniani, Kaustubh Sharma, Ojasva Nema

TL;DR
This paper introduces IMAGE-ALCHEMY, a novel two-stage method for personalized text-to-image generation that maintains subject fidelity and broad scene diversity by fine-tuning attention weights with LoRA on SDXL.
Contribution
The paper presents a new pipeline combining LoRA fine-tuning and segmentation-driven Img2Img to improve personalized subject fidelity without sacrificing scene diversity.
Findings
Achieves a DINO similarity score of 0.789 on SDXL.
Outperforms existing personalized text-to-image methods.
Preserves broader generative capabilities while integrating new subjects.
Abstract
Recent advances in text-to-image diffusion models, particularly Stable Diffusion, have enabled the generation of highly detailed and semantically rich images. However, personalizing these models to represent novel subjects based on a few reference images remains challenging. This often leads to catastrophic forgetting, overfitting, or large computational overhead.We propose a two-stage pipeline that addresses these limitations by leveraging LoRA-based fine-tuning on the attention weights within the U-Net of the Stable Diffusion XL (SDXL) model. First, we use the unmodified SDXL to generate a generic scene by replacing the subject with its class label. Then, we selectively insert the personalized subject through a segmentation-driven image-to-image (Img2Img) pipeline that uses the trained LoRA weights.This framework isolates the subject encoding from the overall composition, thus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Digital Humanities and Scholarship
MethodsAttention Is All You Need · Softmax · Linear Layer · Residual Connection · Multi-Head Attention · Dense Connections · Layer Normalization · Max Pooling · Vision Transformer · Convolution
