DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion
Noam Issachar, Guy Yariv, Sagie Benaim, Yossi Adi, Dani Lischinski, Raanan Fattal

TL;DR
DyPE is a training-free method that enables pre-trained diffusion transformers to generate ultra-high-resolution images by dynamically adjusting positional encodings based on the diffusion process's spectral progression.
Contribution
DyPE introduces a novel spectral-based dynamic positional encoding adjustment that allows ultra-high-resolution image synthesis without additional training or sampling costs.
Findings
Enables generation of images with over 16 million pixels.
Consistently improves performance on multiple benchmarks.
Achieves state-of-the-art fidelity in ultra-high-resolution image generation.
Abstract
Diffusion Transformer models can generate images with remarkable fidelity and detail, yet training them at ultra-high resolutions remains extremely costly due to the self-attention mechanism's quadratic scaling with the number of image tokens. In this paper, we introduce Dynamic Position Extrapolation (DyPE), a novel, training-free method that enables pre-trained diffusion transformers to synthesize images at resolutions far beyond their training data, with no additional sampling cost. DyPE takes advantage of the spectral progression inherent to the diffusion process, where low-frequency structures converge early, while high-frequencies take more steps to resolve. Specifically, DyPE dynamically adjusts the model's positional encoding at each diffusion step, matching their frequency spectrum with the current stage of the generative process. This approach allows us to generate images at…
Peer Reviews
Decision·ICLR 2026 Conference Desk Rejected Submission
- Clear methodological contribution: the paper makes an important observation (higher frequency components show a fairly constant evolution while lower frequencies appear to cease to evolve early) that is validated empirically and based on that the authors propose a method that dynamically adjusts the model's positional encoding and can be implemented on top of existing methods - Clear positioning of the paper's contribution with respect to previous related work. Great exposition of previous wor
No major weaknesses identified. A minor one is whether the authors could include results for other diffusion models (besides FiTv2).
(1) The paper clearly identifies the gap between static positional extrapolation and the dynamic frequency progression required for high-resolution image generation. (2) DyPE extends existing positional encoding formulations with a time-dependent scaling term κ(t) that smoothly transitions from large scaling at early timesteps to unity at later ones, making it easy to integrate into existing extrapolation methods. (3) The method demonstrates consistent quantitative and qualitative improvements
(1) Baseline comparison: Lumina-Next already incorporates timestep dynamics by interpolating from PI to NTK-aware scaling as denoising progresses. It would be important to include Lumina-Next as a baseline in Tables 1, 2, and 3 for a fair and comprehensive comparison. (2) Limited evaluation scope: The experiments focus primarily on FLUX and FiTv2 for image generation. Demonstrating results on additional tasks, such as image editing (e.g., EasyControl [1]) or video generation (e.g., Wan 2.1/2.2
1. adjusting positional encoding along the sampling process seems non-trivial
Missing comparison with existing works, leading to uncomprehensive evaluation and misleading conclusions. 1. For example, FreCaS (ICLR2025), I-Max (arxiv, 2024), HiFlow(arxiv202504, NIPS2025), Diffusion-4K(CVPR2025) are four methods that generate higher-resolution images beyond training sizes on sd3 or flux models. And they are also open to the public far before the ICLR deadline (2025.09.25). Thus, missing comparison with those related works makes the conclusion misleading and unsound. 2. Eve
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis · Image Enhancement Techniques
