Frequency-Forcing: From Scaling-as-Time to Soft Frequency Guidance
Weitao Du

TL;DR
This paper introduces Frequency-Forcing, a method that guides pixel generation with an auxiliary low-frequency stream derived from data, improving image synthesis quality without external dependencies.
Contribution
It proposes a self-derived, data-adapted frequency guidance mechanism that enhances scale-ordered image generation, building on and simplifying prior frequency flow methods.
Findings
Frequency-Forcing improves FID on ImageNet-256 over baselines.
It naturally integrates with semantic streams for further gains.
The method avoids external pretrained encoders by using a lightweight wavelet transform.
Abstract
While standard flow-matching models transport noise to data uniformly, incorporating an explicit generation order - specifically, establishing coarse, low-frequency structure before fine detail - has proven highly effective for synthesizing natural images. Two recent works offer distinct paradigms for this. K-Flow imposes a hard frequency constraint by reinterpreting a frequency scaling variable as flow time, running the trajectory inside a transformed amplitude space. Latent Forcing provides a soft ordering mechanism by coupling the pixel flow with an auxiliary semantic latent flow via asynchronous time schedules, leaving the pixel interpolation path itself untouched. Viewed from the angle of improving pixel generation, we observe that forcing - guiding generation with an earlier-maturing auxiliary stream - offers a highly compatible route to scale-ordered generation without rewriting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
