PyramidalWan: On Making Pretrained Video Model Pyramidal for Efficient Inference

Denis Korzhenkov; Adil Karjauv; Animesh Karnewar; Mohsen Ghafoorian; Amirhossein Habibian

arXiv:2601.04792·cs.CV·January 9, 2026

PyramidalWan: On Making Pretrained Video Model Pyramidal for Efficient Inference

Denis Korzhenkov, Adil Karjauv, Animesh Karnewar, Mohsen Ghafoorian, Amirhossein Habibian

PDF

Open Access

TL;DR

This paper introduces PyramidalWan, a method to convert pretrained video diffusion models into pyramidal structures via low-cost finetuning, significantly improving inference efficiency without sacrificing output quality.

Contribution

It proposes a novel pipeline for transforming pretrained diffusion models into pyramidal models through efficient finetuning, and explores step distillation strategies to further boost inference speed.

Findings

01

Successful conversion of pretrained models into pyramidal structures

02

Maintained high visual quality after finetuning

03

Enhanced inference efficiency through step distillation strategies

Abstract

Recently proposed pyramidal models decompose the conventional forward and backward diffusion processes into multiple stages operating at varying resolutions. These models handle inputs with higher noise levels at lower resolutions, while less noisy inputs are processed at higher resolutions. This hierarchical approach significantly reduces the computational cost of inference in multi-step denoising models. However, existing open-source pyramidal video models have been trained from scratch and tend to underperform compared to state-of-the-art systems in terms of visual plausibility. In this work, we present a pipeline that converts a pretrained diffusion model into a pyramidal one through low-cost finetuning, achieving this transformation without degradation in quality of output videos. Furthermore, we investigate and compare various strategies for step distillation within pyramidal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Face recognition and analysis