TL;DR
Tiled Prompts introduces a method for image and video super-resolution that generates tile-specific prompts to improve local detail and reduce artifacts caused by global prompts.
Contribution
The paper presents a unified framework for super-resolution that addresses prompt misguidance by creating localized prompts for each image or video tile.
Findings
Improves perceptual quality and fidelity in super-resolution tasks.
Reduces hallucinations and tile-level artifacts.
Achieves consistent gains over global-prompt baselines.
Abstract
Text-conditioned diffusion models have advanced image and video super-resolution by using prompts as semantic priors, and modern super-resolution pipelines typically rely on latent tiling to scale to high resolutions. In practice, a single global caption is used with the latent tiling, often causing prompt misguidance. Specifically, a coarse global prompt often misses localized details (errors of omission) and provides locally irrelevant guidance (errors of commission) which leads to substandard results at the tile level. To solve this, we propose Tiled Prompts, a unified framework for image and video super-resolution that generates a tile-specific prompt for each latent tile and performs super-resolution under locally text-conditioned posteriors to resolve prompt misguidance with minimal overhead. Our experiments on high resolution real-world images and videos show that tiled prompts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
