Self-supervised ControlNet with Spatio-Temporal Mamba for Real-world Video Super-resolution
Shijun Shi, Jing Xu, Lijing Lu, Zhihang Li, Kai Hu

TL;DR
This paper introduces a self-supervised, noise-robust video super-resolution framework that enhances content consistency and reduces artifacts in real-world videos by integrating spatio-temporal attention, contrastive learning, and a three-stage training strategy.
Contribution
It proposes a novel self-supervised ControlNet with Spatio-Temporal Mamba for real-world VSR, improving perceptual quality and artifact reduction over existing methods.
Findings
Achieves superior perceptual quality on real-world VSR benchmarks.
Effectively reduces artifacts and enhances content consistency.
Demonstrates robustness to complex degradations in real-world videos.
Abstract
Existing diffusion-based video super-resolution (VSR) methods are susceptible to introducing complex degradations and noticeable artifacts into high-resolution videos due to their inherent randomness. In this paper, we propose a noise-robust real-world VSR framework by incorporating self-supervised learning and Mamba into pre-trained latent diffusion models. To ensure content consistency across adjacent frames, we enhance the diffusion model with a global spatio-temporal attention mechanism using the Video State-Space block with a 3D Selective Scan module, which reinforces coherence at an affordable computational cost. To further reduce artifacts in generated details, we introduce a self-supervised ControlNet that leverages HR features as guidance and employs contrastive learning to extract degradation-insensitive features from LR videos. Finally, a three-stage training strategy based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Image and Signal Denoising Methods · Advanced Image Fusion Techniques
