TL;DR
This paper introduces a blind video recovery method using a metadata-guided diffusion model that does not require predefined masks, effectively restoring corrupted videos by leveraging intrinsic metadata and a novel mask prediction approach.
Contribution
The paper proposes a new blind video recovery framework that utilizes intrinsic metadata and a prior-driven mask predictor, eliminating the need for manual mask annotation.
Findings
The method effectively restores corrupted videos without predefined masks.
The dual-stream metadata encoder captures motion and frame type information.
Post-refinement improves boundary consistency between recovered and intact regions.
Abstract
Bitstream-corrupted video recovery aims to restore realistic content degraded during video storage or transmission. Existing methods typically assume that predefined masks of corrupted regions are available, but manually annotating these masks is labor-intensive and impractical in real-world scenarios. To address this limitation, we introduce a new blind video recovery setting that removes the reliance on predefined masks. This setting presents two major challenges: accurately identifying corrupted regions and recovering content from extensive and irregular degradations. We propose a Metadata-Guided Diffusion Model (M-GDM) to tackle these challenges. Specifically, intrinsic video metadata are leveraged as corruption indicators through a dual-stream metadata encoder that separately embeds motion vectors and frame types before fusing them into a unified representation. This representation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
