Guiding Diffusion with Deep Geometric Moments: Balancing Fidelity and Variation

Sangmin Jung; Utkarsh Nath; Yezhou Yang; Giulia Pedrielli; Joydeep Biswas; Amy Zhang; Hassan Ghasemzadeh; Pavan Turaga

arXiv:2505.12486·cs.CV·May 20, 2025

Guiding Diffusion with Deep Geometric Moments: Balancing Fidelity and Variation

Sangmin Jung, Utkarsh Nath, Yezhou Yang, Giulia Pedrielli, Joydeep Biswas, Amy Zhang, Hassan Ghasemzadeh, Pavan Turaga

PDF

Open Access

TL;DR

This paper introduces Deep Geometric Moments (DGM) as a new guidance method for diffusion-based text-to-image generation, balancing output fidelity and diversity by capturing subject-specific visual features through learned geometric priors.

Contribution

The paper proposes DGMs as a novel guidance approach that emphasizes subject-specific features and robustness, improving control over diffusion models compared to existing spatial guidance methods.

Findings

01

DGMs effectively balance control and diversity in image synthesis.

02

DGMs outperform existing guidance methods in maintaining subject fidelity.

03

DGMs are robust to pixel-wise perturbations, unlike ResNets.

Abstract

Text-to-image generation models have achieved remarkable capabilities in synthesizing images, but often struggle to provide fine-grained control over the output. Existing guidance approaches, such as segmentation maps and depth maps, introduce spatial rigidity that restricts the inherent diversity of diffusion models. In this work, we introduce Deep Geometric Moments (DGM) as a novel form of guidance that encapsulates the subject's visual features and nuances through a learned geometric prior. DGMs focus specifically on the subject itself compared to DINO or CLIP features, which suffer from overemphasis on global image features or semantics. Unlike ResNets, which are sensitive to pixel-wise perturbations, DGMs rely on robust geometric moments. Our experiments demonstrate that DGM effectively balance control and diversity in diffusion-based image generation, allowing a flexible control…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Model Reduction and Neural Networks

MethodsAttention Is All You Need · Softmax · Linear Layer · Residual Connection · Multi-Head Attention · Dense Connections · Layer Normalization · Vision Transformer · self-DIstillation with NO labels · Focus