OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation

Ziye Li; Henghui Ding

arXiv:2605.21343·cs.CV·May 21, 2026

OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation

Ziye Li, Henghui Ding

PDF

1 Repo

TL;DR

OcclusionFormer is a novel image generation framework that explicitly models occlusion relationships using a new dataset and a transformer-based approach, improving spatial realism in overlapping objects.

Contribution

The paper introduces SA-Z, a large-scale occlusion-annotated dataset, and OcclusionFormer, a transformer-based model that explicitly encodes Z-order for better occlusion handling in image synthesis.

Findings

01

Reduces ambiguity in overlapping regions.

02

Enforces correct occlusion dependencies.

03

Achieves substantial accuracy improvements.

Abstract

Recent layout-to-image models have achieved remarkable progress in spatial controllability. However, they still struggle with inter-object occlusion. When bounding boxes overlap, most existing methods lack explicit occlusion information, which makes the generation in intersection regions inherently ambiguous and hinders the determination of complex occlusion relationships. As a result, they often produce entangled textures or physically inconsistent layering in the overlapped areas. To address this issue, we first construct SA-Z, a large-scale dataset enriched with explicit occlusion ordering and pixel-level annotations. Building upon our proposed dataset, we introduce OcclusionFormer, a novel occlusion-aware Diffusion Transformer framework that explicitly models Z-order priority by decoupling instances and compositing them via volume rendering. Furthermore, to ensure fine-grained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fudancvl/OcclusionFormer
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.