IMAGE-ALCHEMY: Advancing subject fidelity in personalised text-to-image generation

Amritanshu Tiwari; Cherish Puniani; Kaustubh Sharma; Ojasva Nema

arXiv:2505.10743·cs.CV·May 19, 2025

IMAGE-ALCHEMY: Advancing subject fidelity in personalised text-to-image generation

Amritanshu Tiwari, Cherish Puniani, Kaustubh Sharma, Ojasva Nema

PDF

Open Access

TL;DR

This paper introduces IMAGE-ALCHEMY, a novel two-stage method for personalized text-to-image generation that maintains subject fidelity and broad scene diversity by fine-tuning attention weights with LoRA on SDXL.

Contribution

The paper presents a new pipeline combining LoRA fine-tuning and segmentation-driven Img2Img to improve personalized subject fidelity without sacrificing scene diversity.

Findings

01

Achieves a DINO similarity score of 0.789 on SDXL.

02

Outperforms existing personalized text-to-image methods.

03

Preserves broader generative capabilities while integrating new subjects.

Abstract

Recent advances in text-to-image diffusion models, particularly Stable Diffusion, have enabled the generation of highly detailed and semantically rich images. However, personalizing these models to represent novel subjects based on a few reference images remains challenging. This often leads to catastrophic forgetting, overfitting, or large computational overhead.We propose a two-stage pipeline that addresses these limitations by leveraging LoRA-based fine-tuning on the attention weights within the U-Net of the Stable Diffusion XL (SDXL) model. First, we use the unmodified SDXL to generate a generic scene by replacing the subject with its class label. Then, we selectively insert the personalized subject through a segmentation-driven image-to-image (Img2Img) pipeline that uses the trained LoRA weights.This framework isolates the subject encoding from the overall composition, thus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Digital Humanities and Scholarship

MethodsAttention Is All You Need · Softmax · Linear Layer · Residual Connection · Multi-Head Attention · Dense Connections · Layer Normalization · Max Pooling · Vision Transformer · Convolution