SeeClear: Semantic Distillation Enhances Pixel Condensation for Video Super-Resolution
Qi Tang, Yao Zhao, Meiqin Liu, Chao Yao

TL;DR
SeeClear introduces a novel semantic distillation framework for video super-resolution that improves detail consistency across frames by integrating semantic controls, dynamic pixel alignment, and a specialized diffusion process.
Contribution
This work presents a new VSR framework combining semantic distillation, instance-centric alignment, and a novel diffusion mechanism, advancing the state-of-the-art in perceptually realistic video enhancement.
Findings
Outperforms existing diffusion-based VSR methods in quality.
Effectively maintains detail consistency across frames.
Demonstrates superior perceptual realism in generated videos.
Abstract
Diffusion-based Video Super-Resolution (VSR) is renowned for generating perceptually realistic videos, yet it grapples with maintaining detail consistency across frames due to stochastic fluctuations. The traditional approach of pixel-level alignment is ineffective for diffusion-processed frames because of iterative disruptions. To overcome this, we introduce SeeClear--a novel VSR framework leveraging conditional video generation, orchestrated by instance-centric and channel-wise semantic controls. This framework integrates a Semantic Distiller and a Pixel Condenser, which synergize to extract and upscale semantic details from low-resolution frames. The Instance-Centric Alignment Module (InCAM) utilizes video-clip-wise tokens to dynamically relate pixels within and across frames, enhancing coherency. Additionally, the Channel-wise Texture Aggregation Memory (CaTeGory) infuses extrinsic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Image Processing Techniques
MethodsDiffusion
