Learning to Refocus with Video Diffusion Models

SaiKiran Tedla; Zhoutong Zhang; Xuaner Zhang; Shumian Xin

arXiv:2512.19823·cs.CV·December 30, 2025

Learning to Refocus with Video Diffusion Models

SaiKiran Tedla, Zhoutong Zhang, Xuaner Zhang, Shumian Xin

PDF

Open Access 1 Models

TL;DR

This paper presents a novel method using video diffusion models to generate realistic, interactive refocusing from a single defocused image, enabling post-capture focus adjustments and enhancing photography editing capabilities.

Contribution

It introduces a new approach for post-capture refocusing with video diffusion models and provides a large-scale dataset for future research.

Findings

01

Outperforms existing methods in perceptual quality

02

Robust across diverse real-world scenarios

03

Enables interactive focus editing

Abstract

Focus is a cornerstone of photography, yet autofocus systems often fail to capture the intended subject, and users frequently wish to adjust focus after capture. We introduce a novel method for realistic post-capture refocusing using video diffusion models. From a single defocused image, our approach generates a perceptually accurate focal stack, represented as a video sequence, enabling interactive refocusing and unlocking a range of downstream applications. We release a large-scale focal stack dataset acquired under diverse real-world smartphone conditions to support this work and future research. Our method consistently outperforms existing approaches in both perceptual quality and robustness across challenging scenarios, paving the way for more advanced focus-editing capabilities in everyday photography. Code and data are available at www.learn2refocus.github.io

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
tedlasai/learn2refocus
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications · Visual Attention and Saliency Detection · Image Enhancement Techniques