TL;DR
Latent Wavelet Diffusion (LWD) is a novel, efficient training framework that enhances detail and texture fidelity in ultra-high-resolution image synthesis without increasing inference costs.
Contribution
LWD introduces a frequency-aware masking strategy and a scale-consistent VAE objective to improve high-resolution image synthesis quality efficiently.
Findings
LWD improves perceptual quality and FID scores across multiple baselines.
LWD requires no architectural changes and adds no inference cost.
LWD effectively focuses training on detail-rich regions using wavelet energy maps.
Abstract
High-resolution image synthesis remains a core challenge in generative modeling, particularly in balancing computational efficiency with the preservation of fine-grained visual detail. We present Latent Wavelet Diffusion (LWD), a lightweight training framework that significantly improves detail and texture fidelity in ultra-high-resolution (2K-4K) image synthesis. LWD introduces a novel, frequency-aware masking strategy derived from wavelet energy maps, which dynamically focuses the training process on detail-rich regions of the latent space. This is complemented by a scale-consistent VAE objective to ensure high spectral fidelity. The primary advantage of our approach is its efficiency: LWD requires no architectural modifications and adds zero additional cost during inference, making it a practical solution for scaling existing models. Across multiple strong baselines, LWD consistently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
