UniVST: A Unified Framework for Training-free Localized Video Style Transfer

Quanjian Song; Mingbao Lin; Wengyi Zhan; Shuicheng Yan; Liujuan Cao; Rongrong Ji

arXiv:2410.20084·cs.CV·November 19, 2025

UniVST: A Unified Framework for Training-free Localized Video Style Transfer

Quanjian Song, Mingbao Lin, Wengyi Zhan, Shuicheng Yan, Liujuan Cao, Rongrong Ji

PDF

Open Access 1 Repo

TL;DR

UniVST introduces a training-free, unified diffusion-based framework for localized video style transfer, enhancing temporal consistency and detail preservation without requiring model training.

Contribution

It proposes a novel training-free localized video style transfer method using diffusion models, with a point-matching mask propagation, AdaIN-guided stylization, and optical flow-based smoothing.

Findings

01

Outperforms existing methods in quantitative metrics

02

Achieves better temporal consistency and detail preservation

03

Operates without training, simplifying deployment

Abstract

This paper presents UniVST, a unified framework for localized video style transfer based on diffusion models. It operates without the need for training, offering a distinct advantage over existing diffusion methods that transfer style across entire videos. The endeavors of this paper comprise: (1) A point-matching mask propagation strategy that leverages the feature maps from the DDIM inversion. This streamlines the model's architecture by obviating the need for tracking models. (2) A training-free AdaIN-guided localized video stylization mechanism that operates at both the latent and attention levels. This balances content fidelity and style richness, mitigating the loss of localized details commonly associated with direct video stylization. (3) A sliding-window consistent smoothing scheme that harnesses optical flow within the pixel representation and refines predicted noise to update…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

QuanjianSong/UniVST
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Speech and Audio Processing · Video Analysis and Summarization

MethodsSoftmax · Attention Is All You Need · Diffusion