Spatial Gram Alignment for Ultra-High-Resolution Image Synthesis

Jinjin Zhang; Xiefan Guo; Di Huang

arXiv:2605.20808·cs.CV·May 21, 2026

Spatial Gram Alignment for Ultra-High-Resolution Image Synthesis

Jinjin Zhang, Xiefan Guo, Di Huang

PDF

1 Repo 1 Models

TL;DR

The paper introduces Spatial Gram Alignment (SGA), a novel method that improves ultra-high-resolution image synthesis by aligning internal self-similarities of generative features with foundation model priors, preserving fidelity and structure.

Contribution

SGA offers a non-invasive spatial constraint approach that enhances large-scale latent diffusion models for ultra-high-resolution synthesis, outperforming existing methods.

Findings

01

Achieves state-of-the-art results in ultra-high-resolution text-to-image synthesis.

02

Effectively balances global structural coherence with fine-grained visual details.

03

Seamlessly integrates with existing pre-trained latent diffusion models.

Abstract

Modern ultra-high-resolution image synthesis relies heavily on the robust generative capacity of large-scale pre-trained Latent Diffusion Models (LDMs). While recent representation alignment methods have proven effective by distilling visual priors from foundation models (e.g., SAM or DINO) into generative latent features, scaling these approaches to pre-trained LDMs at extreme resolutions exposes a critical learnability-fidelity conflict. Specifically, forcing direct patch-wise feature distillation inherently perturbs the pre-trained latent manifold, ultimately leading to generation degradation. To address this bottleneck, we propose Spatial Gram Alignment (SGA), a novel framework that explicitly leverages the representation priors of vision foundation models while preserving the native generative capacity of LDMs. Moving beyond restrictive direct alignment, SGA imposes a non-invasive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhang0jhon/SGA
github

Models

🤗
zhang0jhon/SGA
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.