Self-Refining Video Sampling

Sangwon Jang; Taekyung Ki; Jaehyeong Jo; Saining Xie; Jaehong Yoon; Sung Ju Hwang

arXiv:2601.18577·cs.CV·May 21, 2026

Self-Refining Video Sampling

Sangwon Jang, Taekyung Ki, Jaehyeong Jo, Saining Xie, Jaehong Yoon, Sung Ju Hwang

PDF

1 Repo

TL;DR

This paper introduces a self-refining method for video sampling that iteratively improves generated videos at inference time using the generator itself as a denoising autoencoder, enhancing motion realism without extra training.

Contribution

It proposes a novel self-refining inference technique for video generators, including an uncertainty-aware refinement strategy to improve physical realism and motion coherence.

Findings

01

Achieves over 70% human preference over baseline samplers.

02

Significantly improves motion coherence and physics alignment in generated videos.

03

Demonstrates effectiveness across state-of-the-art video generators.

Abstract

Modern video generators still struggle with complex physical dynamics, often falling short of physical realism. Existing approaches address this using external verifiers or additional training on augmented data, which is computationally expensive and still limited in capturing fine-grained motion. In this work, we present self-refining video sampling, a simple method that uses a pre-trained video generator trained on large-scale datasets as its own self-refiner. By interpreting the generator as a denoising autoencoder, we enable iterative inner-loop refinement at inference time without any external verifier or additional training. We further introduce an uncertainty-aware refinement strategy that selectively refines regions based on self-consistency, which prevents artifacts caused by over-refinement. Experiments on state-of-the-art video generators demonstrate significant improvements…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

agwmon/self-refine-video
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Human Pose and Action Recognition