Improving Temporal Consistency and Fidelity at Inference-time in Perceptual Video Restoration by Zero-shot Image-based Diffusion Models
Nasrin Rahimi, A. Murat Tekalp

TL;DR
This paper introduces inference-time strategies to enhance temporal consistency and fidelity in zero-shot perceptual video restoration using diffusion models, without retraining, by guiding denoising and ensembling trajectories.
Contribution
It proposes two novel inference-time techniques, PSG and MPES, to improve temporal coherence and fidelity in video restoration with pretrained diffusion models.
Findings
PSG improves temporal perceptual scores like FVD and straightness.
MPES enhances fidelity metrics such as PSNR and SSIM.
Combined, these methods enable stable, high-quality video restoration without retraining.
Abstract
Diffusion models have emerged as powerful priors for single-image restoration, but their application to zero-shot video restoration suffers from temporal inconsistencies due to the stochastic nature of sampling and complexity of incorporating explicit temporal modeling. In this work, we address the challenge of improving temporal coherence in video restoration using zero-shot image-based diffusion models without retraining or modifying their architecture. We propose two complementary inference-time strategies: (1) Perceptual Straightening Guidance (PSG) based on the neuroscience-inspired perceptual straightening hypothesis, which steers the diffusion denoising process towards smoother temporal evolution by incorporating a curvature penalty in a perceptual space to improve temporal perceptual scores, such as Fr\'echet Video Distance (FVD) and perceptual straightness; and (2) Multi-Path…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
