UltraImage: Rethinking Resolution Extrapolation in Image Diffusion Transformers
Min Zhao, Bokai Yan, Xue Yang, Hongzhou Zhu, Jintao Zhang, Shilong Liu, Chongxuan Li, Jun Zhu

TL;DR
UltraImage introduces a frequency correction and adaptive attention method to significantly improve high-resolution image generation in diffusion transformers, enabling extrapolation beyond training scales with reduced repetition and enhanced quality.
Contribution
The paper presents a novel frequency correction technique and entropy-guided attention mechanism to address resolution extrapolation challenges in image diffusion transformers.
Findings
Outperforms prior methods on Qwen-Image and Flux datasets.
Enables generation of images up to 6K*6K resolution.
Reduces content repetition and improves visual fidelity.
Abstract
Recent image diffusion transformers achieve high-fidelity generation, but struggle to generate images beyond these scales, suffering from content repetition and quality degradation. In this work, we present UltraImage, a principled framework that addresses both issues. Through frequency-wise analysis of positional embeddings, we identify that repetition arises from the periodicity of the dominant frequency, whose period aligns with the training resolution. We introduce a recursive dominant frequency correction to constrain it within a single period after extrapolation. Furthermore, we find that quality degradation stems from diluted attention and thus propose entropy-guided adaptive attention concentration, which assigns higher focus factors to sharpen local attention for fine detail and lower ones to global attention patterns to preserve structural consistency. Experiments show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Enhancement Techniques · Advanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis
