EAM: Enhancing Anything with Diffusion Transformers for Blind Super-Resolution

Haizhen Xie; Kunpeng Du; Qiangyu Yan; Sen Lu; Jianhong Han; Hanting Chen; Hailin Hu; Jie Hu

arXiv:2505.05209·cs.CV·May 12, 2026

EAM: Enhancing Anything with Diffusion Transformers for Blind Super-Resolution

Haizhen Xie, Kunpeng Du, Qiangyu Yan, Sen Lu, Jianhong Han, Hanting Chen, Hailin Hu, Jie Hu

PDF

TL;DR

EAM introduces a novel diffusion transformer-based approach for blind super-resolution, leveraging a new block, progressive masking, and subject-aware prompts to outperform previous methods.

Contribution

The paper presents EAM, a diffusion transformer-based BSR method with a novel guiding block, progressive masking strategy, and subject-aware prompts, improving performance and generalization.

Findings

01

EAM achieves state-of-the-art results on multiple datasets.

02

EAM outperforms U-Net-based approaches in quantitative metrics.

03

The proposed strategies reduce training costs and enhance image restoration quality.

Abstract

Utilizing pre-trained Text-to-Image (T2I) diffusion models to guide Blind Super-Resolution (BSR) has become a predominant approach in the field. While T2I models have traditionally relied on U-Net architectures, recent advancements have demonstrated that Diffusion Transformers (DiT) achieve significantly higher performance in this domain. In this work, we introduce Enhancing Anything Model (EAM), a novel BSR method that leverages DiT and outperforms previous U-Net-based approaches. We introduce a novel block, $Ψ$ -DiT, which effectively guides the DiT to enhance image restoration. This block employs a low-resolution latent as a separable flow injection control, forming a triple-flow architecture that effectively leverages the prior knowledge embedded in the pre-trained DiT. To fully exploit the prior guidance capabilities of T2I models and enhance their generalization in BSR, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.