A LoRA is Worth a Thousand Pictures

Chenxi Liu; Towaki Takikawa; Alec Jacobson

arXiv:2412.12048·cs.CV·December 17, 2024

A LoRA is Worth a Thousand Pictures

Chenxi Liu, Towaki Takikawa, Alec Jacobson

PDF

Open Access

TL;DR

This paper demonstrates that LoRA weights can effectively describe artistic styles, enabling style clustering and retrieval without image generation or training set knowledge, advancing style analysis in diffusion models.

Contribution

It reveals that LoRA weights alone can serve as style descriptors, outperforming traditional features in clustering and retrieval tasks, and discusses future applications like zero-shot fine-tuning.

Findings

01

LoRA weights outperform traditional features in style clustering.

02

LoRA-based embeddings show structural similarity to image-based embeddings.

03

Approach enables accurate style retrieval without training image knowledge.

Abstract

Recent advances in diffusion models and parameter-efficient fine-tuning (PEFT) have made text-to-image generation and customization widely accessible, with Low Rank Adaptation (LoRA) able to replicate an artist's style or subject using minimal data and computation. In this paper, we examine the relationship between LoRA weights and artistic styles, demonstrating that LoRA weights alone can serve as an effective descriptor of style, without the need for additional image generation or knowledge of the original training set. Our findings show that LoRA weights yield better performance in clustering of artistic styles compared to traditional pre-trained features, such as CLIP and DINO, with strong structural similarities between LoRA-based and conventional image-based embeddings observed both qualitatively and quantitatively. We identify various retrieval scenarios for the growing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Surveying and Cultural Heritage · Robotics and Sensor-Based Localization

MethodsAttention Is All You Need · Linear Layer · Softmax · Dense Connections · Multi-Head Attention · Layer Normalization · Residual Connection · Vision Transformer · Diffusion · Contrastive Language-Image Pre-training