Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Vladimir Arkhipkin; Vladimir Korviakov; Nikolai Gerasimenko; Denis Parkhomenko; Viacheslav Vasilev; Alexey Letunovskiy; Nikolai Vaulin; Maria Kovaleva; Ivan Kirillov; Lev Novitskiy; Denis Koposov; Nikita Kiselev; Alexander Varlamov; Dmitrii Mikhailov; Vladimir Polovnikov; Andrey Shutkin; Julia Agafonova; Ilya Vasiliev; Anastasiia Kargapoltseva; Anna Dmitrienko; Anastasia Maltseva; Anna Averchenkova; Olga Kim; Tatiana Nikulina; Denis Dimitrov

arXiv:2511.14993·cs.CV·November 21, 2025

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Vladimir Arkhipkin, Vladimir Korviakov, Nikolai Gerasimenko, Denis Parkhomenko, Viacheslav Vasilev, Alexey Letunovskiy, Nikolai Vaulin, Maria Kovaleva, Ivan Kirillov, Lev Novitskiy, Denis Koposov, Nikita Kiselev, Alexander Varlamov, Dmitrii Mikhailov, Vladimir Polovnikov

PDF

Open Access 10 Models

TL;DR

Kandinsky 5.0 is a comprehensive family of high-resolution image and video generative models, featuring multiple sizes and optimized training techniques for superior quality and speed, with open-source availability.

Contribution

The paper introduces Kandinsky 5.0, a new family of foundation models for image and video generation, including novel training, architectural, and inference optimizations.

Findings

01

Achieves state-of-the-art performance in image and video synthesis

02

Demonstrates high generation speed and quality through novel optimizations

03

Provides open-source code and models for research community

Abstract

This report introduces Kandinsky 5.0, a family of state-of-the-art foundation models for high-resolution image and 10-second video synthesis. The framework comprises three core line-up of models: Kandinsky 5.0 Image Lite - a line-up of 6B parameter image generation models, Kandinsky 5.0 Video Lite - a fast and lightweight 2B parameter text-to-video and image-to-video models, and Kandinsky 5.0 Video Pro - 19B parameter models that achieves superior video generation quality. We provide a comprehensive review of the data curation lifecycle - including collection, processing, filtering and clustering - for the multi-stage training pipeline that involves extensive pre-training and incorporates quality-enhancement techniques such as self-supervised fine-tuning (SFT) and reinforcement learning (RL)-based post-training. We also present novel architectural, training, and inference optimizations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Music Technology and Sound Studies · Human Motion and Animation