A LoRA is Worth a Thousand Pictures
Chenxi Liu, Towaki Takikawa, Alec Jacobson

TL;DR
This paper demonstrates that LoRA weights can effectively describe artistic styles, enabling style clustering and retrieval without image generation or training set knowledge, advancing style analysis in diffusion models.
Contribution
It reveals that LoRA weights alone can serve as style descriptors, outperforming traditional features in clustering and retrieval tasks, and discusses future applications like zero-shot fine-tuning.
Findings
LoRA weights outperform traditional features in style clustering.
LoRA-based embeddings show structural similarity to image-based embeddings.
Approach enables accurate style retrieval without training image knowledge.
Abstract
Recent advances in diffusion models and parameter-efficient fine-tuning (PEFT) have made text-to-image generation and customization widely accessible, with Low Rank Adaptation (LoRA) able to replicate an artist's style or subject using minimal data and computation. In this paper, we examine the relationship between LoRA weights and artistic styles, demonstrating that LoRA weights alone can serve as an effective descriptor of style, without the need for additional image generation or knowledge of the original training set. Our findings show that LoRA weights yield better performance in clustering of artistic styles compared to traditional pre-trained features, such as CLIP and DINO, with strong structural similarities between LoRA-based and conventional image-based embeddings observed both qualitatively and quantitatively. We identify various retrieval scenarios for the growing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Surveying and Cultural Heritage · Robotics and Sensor-Based Localization
MethodsAttention Is All You Need · Linear Layer · Softmax · Dense Connections · Multi-Head Attention · Layer Normalization · Residual Connection · Vision Transformer · Diffusion · Contrastive Language-Image Pre-training
