DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion

Noam Issachar; Guy Yariv; Sagie Benaim; Yossi Adi; Dani Lischinski; Raanan Fattal

arXiv:2510.20766·cs.CV·January 30, 2026

DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion

Noam Issachar, Guy Yariv, Sagie Benaim, Yossi Adi, Dani Lischinski, Raanan Fattal

PDF

Open Access 3 Reviews

TL;DR

DyPE is a training-free method that enables pre-trained diffusion transformers to generate ultra-high-resolution images by dynamically adjusting positional encodings based on the diffusion process's spectral progression.

Contribution

DyPE introduces a novel spectral-based dynamic positional encoding adjustment that allows ultra-high-resolution image synthesis without additional training or sampling costs.

Findings

01

Enables generation of images with over 16 million pixels.

02

Consistently improves performance on multiple benchmarks.

03

Achieves state-of-the-art fidelity in ultra-high-resolution image generation.

Abstract

Diffusion Transformer models can generate images with remarkable fidelity and detail, yet training them at ultra-high resolutions remains extremely costly due to the self-attention mechanism's quadratic scaling with the number of image tokens. In this paper, we introduce Dynamic Position Extrapolation (DyPE), a novel, training-free method that enables pre-trained diffusion transformers to synthesize images at resolutions far beyond their training data, with no additional sampling cost. DyPE takes advantage of the spectral progression inherent to the diffusion process, where low-frequency structures converge early, while high-frequencies take more steps to resolve. Specifically, DyPE dynamically adjusts the model's positional encoding at each diffusion step, matching their frequency spectrum with the current stage of the generative process. This approach allows us to generate images at…

Peer Reviews

Decision·ICLR 2026 Conference Desk Rejected Submission

Reviewer 01Rating 8Confidence 4

Strengths

- Clear methodological contribution: the paper makes an important observation (higher frequency components show a fairly constant evolution while lower frequencies appear to cease to evolve early) that is validated empirically and based on that the authors propose a method that dynamically adjusts the model's positional encoding and can be implemented on top of existing methods - Clear positioning of the paper's contribution with respect to previous related work. Great exposition of previous wor

Weaknesses

No major weaknesses identified. A minor one is whether the authors could include results for other diffusion models (besides FiTv2).

Reviewer 02Rating 6Confidence 4

Strengths

(1) The paper clearly identifies the gap between static positional extrapolation and the dynamic frequency progression required for high-resolution image generation. (2) DyPE extends existing positional encoding formulations with a time-dependent scaling term κ(t) that smoothly transitions from large scaling at early timesteps to unity at later ones, making it easy to integrate into existing extrapolation methods. (3) The method demonstrates consistent quantitative and qualitative improvements

Weaknesses

(1) Baseline comparison: Lumina-Next already incorporates timestep dynamics by interpolating from PI to NTK-aware scaling as denoising progresses. It would be important to include Lumina-Next as a baseline in Tables 1, 2, and 3 for a fair and comprehensive comparison. (2) Limited evaluation scope: The experiments focus primarily on FLUX and FiTv2 for image generation. Demonstrating results on additional tasks, such as image editing (e.g., EasyControl [1]) or video generation (e.g., Wan 2.1/2.2

Reviewer 03Rating 2Confidence 5

Strengths

1. adjusting positional encoding along the sampling process seems non-trivial

Weaknesses

Missing comparison with existing works, leading to uncomprehensive evaluation and misleading conclusions. 1. For example, FreCaS (ICLR2025), I-Max (arxiv, 2024), HiFlow(arxiv202504, NIPS2025), Diffusion-4K(CVPR2025) are four methods that generate higher-resolution images beyond training sizes on sd3 or flux models. And they are also open to the public far before the ICLR deadline (2025.09.25). Thus, missing comparison with those related works makes the conclusion misleading and unsound. 2. Eve

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis · Image Enhancement Techniques