SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training

Jianyi Wang; Shanchuan Lin; Zhijie Lin; Yuxi Ren; Meng Wei; Zongsheng Yue; Shangchen Zhou; Hao Chen; Yang Zhao; Ceyuan Yang; Xuefeng Xiao; Chen Change Loy; Lu Jiang

arXiv:2506.05301·cs.CV·January 29, 2026

SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training

Jianyi Wang, Shanchuan Lin, Zhijie Lin, Yuxi Ren, Meng Wei, Zongsheng Yue, Shangchen Zhou, Hao Chen, Yang Zhao, Ceyuan Yang, Xuefeng Xiao, Chen Change Loy, Lu Jiang

PDF

Open Access 6 Models 1 Datasets 3 Reviews

TL;DR

SeedVR2 introduces a one-step diffusion-based video restoration model that achieves high-quality results efficiently by employing adaptive window attention and novel training losses, outperforming existing methods.

Contribution

The paper presents SeedVR2, a novel one-step diffusion-based VR model with adaptive window attention and improved training techniques for high-resolution video restoration.

Findings

01

Achieves comparable or better performance than existing VR methods.

02

Operates effectively in a single step, reducing computational cost.

03

Handles high-resolution video restoration with adaptive mechanisms.

Abstract

Recent advances in diffusion-based video restoration (VR) demonstrate significant improvement in visual quality, yet yield a prohibitive computational cost during inference. While several distillation-based approaches have exhibited the potential of one-step image restoration, extending existing approaches to VR remains challenging and underexplored, particularly when dealing with high-resolution video in real-world settings. In this work, we propose a one-step diffusion-based VR model, termed as SeedVR2, which performs adversarial VR training against real data. To handle the challenging high-resolution VR within a single step, we introduce several enhancements to both model architecture and training procedures. Specifically, an adaptive window attention mechanism is proposed, where the window size is dynamically adjusted to fit the output resolutions, avoiding window inconsistency…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 4

Strengths

- I think the jump to truly one-step VR with a diffusion transformer (initialized from SeedVR) plus APT is a meaningful step beyond prior one-step image restoration; prior works are mostly teacher-distillation or rely on fixed diffusion priors that cap quality. This work claims distillation-free adversarial post-training against real data after a lightweight progressive distillation stage to bridge the gap, which is interesting for video. - The adaptive window attention to handle arbitrary res

Weaknesses

- I am concerned about the compute-heaviness. I think the approach relies heavily on significant compute (72×H100, 10M/5M pairs), which limits reproducibility in typical academic labs despite code release plans. Claims of “largest-ever VR GAN” underscore this. - Scope of degradations. While synthetic degradations follow prior work, I think the paper could better characterize real-world degradation diversity and robustness (e.g., compression artifacts, rolling shutter, severe motion blur) bey

Reviewer 02Rating 2Confidence 5

Strengths

- The introduction of adaptive window attention effectively reduces boundary artifacts when processing high-resolution frames. - The training strategy which combines RpGAN, approximate R2 regularization, feature-matching losses, and progressive distillation to ensure stable convergence and high perceptual quality is comprehensive. - The experiments are extensive and include both synthetic and real-world data, multiple objective and perceptual metrics, as well as a well-organized user study.

Weaknesses

- My main concern is that the novelty of the method is somewhat limited, as it largely builds upon the existing Adversarial Post-Training (APT) framework, and the paper does not clearly explain the fundamental differences or new contributions beyond APT. - The training process is extremely resource-intensive, requiring 72 H100 GPUs, which significantly limits reproducibility and practical accessibility. - The method’s robustness under challenging conditions, such as heavy degradations, large m

Reviewer 03Rating 6Confidence 4

Strengths

1. The paper introduces a novel one-step VR method by applying APT to diffusion-based models, reducing the computational burden significantly compared to traditional multi-step approaches. 2. The adaptive window attention mechanism for handling high-resolution videos and the feature matching loss for training stability are key contributions that improve the model's performance and robustness across varying video resolutions. 3. The method shows promising quantitative and qualitative results, o

Weaknesses

1. The paper lacks comparisons with the latest VSR methods presented at NeurIPS 2025 (such as DLoraL [1] and DOVE [2]). The authors should include comparisons with these methods to better demonstrate the competitiveness of the proposed approach. 2. The paper does not provide results trained on public datasets (such as REDS). The reported improvements might stem from using a larger private dataset. Will the authors make the dataset publicly available? 3. Despite achieving faster inference, the

Code & Models

Models

Datasets

Iceclear/SeedVR_VideoDemos
dataset· 2.4k dl
2.4k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning

MethodsSoftmax · Attention Is All You Need