FrequencyBooster: Full-Frequency Modeling for High-Fidelity Pixel Diffusion
Lichen Ma, Zipeng Guo, Yu He, Xiaolong Fu, Luohang Liu, Jingling Fu, Junshi Huang, Yan Li

TL;DR
FrequencyBooster introduces a full-frequency modeling framework for pixel diffusion that enhances high-frequency detail preservation and global structure, achieving state-of-the-art image generation quality efficiently.
Contribution
The paper presents a novel high-capacity decoder and a full-frequency modeling approach that surpasses prior pixel diffusion models in fidelity and efficiency.
Findings
Achieves a state-of-the-art FID of 1.60 at 256x256 resolution in 320 epochs.
Attains an FID of 1.69 at 512x512 resolution, outperforming existing models.
Effectively balances high-frequency detail preservation with global structural integrity.
Abstract
To circumvent the inherent fidelity bottlenecks and optimization misalignment of VAE-based latent diffusion, pixel-space diffusion models have emerged as a compelling end-to-end paradigm. However, existing pixel diffusion models often struggle to balance computational efficiency with the preservation of high-frequency details. They frequently resort to patch-based compression or restricted local decoding, leading to a "spectral compromise" where high-frequency and fine-grained pixel information are suppressed. To address these challenges, we propose \textbf{FrequencyBooster}, a novel framework designed to empower pixel diffusion with full-frequency modeling capabilities without prohibitive overhead. The core of our method is a high-capacity decoder that specializes in extracting exhaustive high-frequency details and low-frequency semantics, the latter of which is derived from a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
