Controlling Text-to-Image Diffusion by Orthogonal Finetuning

Zeju Qiu; Weiyang Liu; Haiwen Feng; Yuxuan Xue; Yao Feng; Zhen Liu,; Dan Zhang; Adrian Weller; Bernhard Sch\"olkopf

arXiv:2306.07280·cs.CV·March 15, 2024·21 cites

Controlling Text-to-Image Diffusion by Orthogonal Finetuning

Zeju Qiu, Weiyang Liu, Haiwen Feng, Yuxuan Xue, Yao Feng, Zhen Liu,, Dan Zhang, Adrian Weller, Bernhard Sch\"olkopf

PDF

Open Access 2 Repos 3 Models

TL;DR

This paper introduces Orthogonal Finetuning (OFT), a novel method for adapting text-to-image diffusion models to downstream tasks while preserving their semantic generation ability, with improved stability and performance.

Contribution

The paper proposes OFT and COFT, new finetuning techniques that maintain model properties and enhance stability, outperforming existing methods in text-to-image tasks.

Findings

01

OFT preserves hyperspherical energy, maintaining semantic generation.

02

OFT outperforms existing methods in quality and convergence speed.

03

COFT adds radius constraints for improved finetuning stability.

Abstract

Large text-to-image diffusion models have impressive capabilities in generating photorealistic images from text prompts. How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method -- Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We find that this property is crucial for preserving the semantic generation ability of text-to-image diffusion models. To improve finetuning stability, we further propose Constrained Orthogonal Finetuning (COFT) which imposes an additional radius constraint to the hypersphere. Specifically, we consider two important finetuning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion