Kandinsky 3: Text-to-Image Synthesis for Multifunctional Generative   Framework

Vladimir Arkhipkin; Viacheslav Vasilev; Andrei Filatov; Igor Pavlov,; Julia Agafonova; Nikolai Gerasimenko; Anna Averchenkova; Evelina Mironova,; Anton Bukashkin; Konstantin Kulikov; Andrey Kuznetsov; Denis Dimitrov

arXiv:2410.21061·cs.CV·October 29, 2024

Kandinsky 3: Text-to-Image Synthesis for Multifunctional Generative Framework

Vladimir Arkhipkin, Viacheslav Vasilev, Andrei Filatov, Igor Pavlov,, Julia Agafonova, Nikolai Gerasimenko, Anna Averchenkova, Evelina Mironova,, Anton Bukashkin, Konstantin Kulikov, Andrey Kuznetsov, Denis Dimitrov

PDF

Open Access 1 Repo

TL;DR

Kandinsky 3 is a versatile, high-quality text-to-image diffusion model that supports multiple image generation tasks and is optimized for efficiency, with publicly available code and user-friendly demo.

Contribution

Introducing Kandinsky 3, a multifunctional T2I model with a simple architecture, extended capabilities, and a faster distilled version without quality loss.

Findings

01

High-quality, photorealistic image generation

02

Supports diverse tasks like inpainting, fusion, and video synthesis

03

Faster inference with maintained image quality

Abstract

Text-to-image (T2I) diffusion models are popular for introducing image manipulation methods, such as editing, image fusion, inpainting, etc. At the same time, image-to-video (I2V) and text-to-video (T2V) models are also built on top of T2I models. We present Kandinsky 3, a novel T2I model based on latent diffusion, achieving a high level of quality and photorealism. The key feature of the new architecture is the simplicity and efficiency of its adaptation for many types of generation tasks. We extend the base T2I model for various applications and create a multifunctional generation system that includes text-guided inpainting/outpainting, image fusion, text-image fusion, image variations generation, I2V and T2V generation. We also present a distilled version of the T2I model, evaluating inference in 4 steps of the reverse process without reducing image quality and 3 times faster than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ai-forever/kandinsky-3
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsDiffusion · Balanced Selection