StructDiff: A Structure-Preserving and Spatially Controllable Diffusion Model for Single-Image Generation

Yinxi He; Kang Liao; Chunyu Lin; Tianyi Wei; Yao Zhao

arXiv:2604.12575·cs.CV·April 15, 2026

StructDiff: A Structure-Preserving and Spatially Controllable Diffusion Model for Single-Image Generation

Yinxi He, Kang Liao, Chunyu Lin, Tianyi Wei, Yao Zhao

PDF

2 Repos

TL;DR

StructDiff is a novel diffusion-based framework for single-image generation that preserves structure, offers spatial control via positional encoding, and introduces a new evaluation criterion, outperforming existing methods.

Contribution

It introduces an adaptive receptive field module, employs 3D positional encoding for spatial control, and proposes a new LLM-based evaluation criterion for single-image generation.

Findings

01

Outperforms existing methods in structural consistency and visual quality.

02

Enables flexible spatial control over generated content.

03

Demonstrates broad applicability across various image synthesis tasks.

Abstract

This paper introduces StructDiff, a generative framework based on a single-scale diffusion model for single-image generation. Single-image generation aims to synthesize diverse samples with similar visual content to the source image by capturing its internal statistics, without relying on external data. However, existing methods often struggle to preserve the structural layout, especially for images with large rigid objects or strict spatial constraints. Moreover, most approaches lack spatial controllability, making it difficult to guide the structure or placement of generated content. To address these challenges, StructDiff introduces an \textit{adaptive receptive field} module to maintain both global and local distributions. Building on this foundation, StructDiff incorporates 3D positional encoding (PE) as a spatial prior, allowing flexible control over positions, scale, and local…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.