FrescoDiffusion: 4K Image-to-Video with Prior-Regularized Tiled Diffusion
Hugo Caselles-Dupr\'e (1), Mathis Koroglu (1, 2), Guillaume Jeanneret (2), Arnaud Dapogny (2), Matthieu Cord (2) ((1) Obvious Research, Paris, France, (2) Institute of Intelligent Systems, Robotics - Sorbonne University, Paris, France)

TL;DR
FrescoDiffusion is a training-free method that enhances 4K image-to-video generation by combining tiled denoising with a global latent prior, ensuring high-resolution detail and spatial-temporal coherence.
Contribution
It introduces a novel, training-free approach for large-format I2V generation that fuses tiled denoising with a precomputed global latent reference for improved coherence.
Findings
Improved global consistency and fidelity over tiled baselines.
Efficient 4K image-to-video generation with fine detail preservation.
Enables controllable trade-off between creativity and consistency.
Abstract
Diffusion-based image-to-video (I2V) models are increasingly effective, yet they struggle to scale to ultra-high-resolution inputs (e.g., 4K). Generating videos at the model's native resolution often loses fine-grained structure, whereas high-resolution tiled denoising preserves local detail but breaks global layout consistency. This failure mode is particularly severe in the fresco animation setting: monumental artworks containing many distinct characters, objects, and semantically different sub-scenes that must remain spatially coherent over time. We introduce FrescoDiffusion, a training-free method for coherent large-format I2V generation from a single complex image. The key idea is to augment tiled denoising with a precomputed latent prior: we first generate a low-resolution video at the underlying model resolution and upsample its latent trajectory to obtain a global reference that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Advanced Vision and Imaging
