Is This Loss Informative? Faster Text-to-Image Customization by Tracking   Objective Dynamics

Anton Voronov; Mikhail Khoroshikh; Artem Babenko; Max Ryabinin

arXiv:2302.04841·cs.CV·November 2, 2023

Is This Loss Informative? Faster Text-to-Image Customization by Tracking Objective Dynamics

Anton Voronov, Mikhail Khoroshikh, Artem Babenko, Max Ryabinin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a simple early stopping criterion based on the training objective to accelerate text-to-image model personalization, achieving up to 8x faster adaptation without quality loss.

Contribution

The authors propose a novel, easy-to-implement early stopping method that tracks objective dynamics to speed up personalization of large text-to-image models.

Findings

01

Up to 8 times faster adaptation with no quality loss.

02

Most concepts are learned early, and standard metrics fail to indicate convergence.

03

The method is effective across multiple concepts and personalization techniques.

Abstract

Text-to-image generation models represent the next step of evolution in image synthesis, offering a natural way to achieve flexible yet fine-grained control over the result. One emerging area of research is the fast adaptation of large text-to-image models to smaller datasets or new visual concepts. However, many efficient methods of adaptation have a long training time, which limits their practical applications, slows down experiments, and spends excessive GPU resources. In this work, we study the training dynamics of popular text-to-image personalization methods (such as Textual Inversion or DreamBooth), aiming to speed them up. We observe that most concepts are learned at early stages and do not improve in quality later, but standard training convergence metrics fail to indicate that. Instead, we propose a simple drop-in early stopping criterion that only requires computing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yandex-research/dvar
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Advanced Image and Video Retrieval Techniques

MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Early Stopping