Dependency-Aware Discrete Diffusion for Scene Graph Generation

Rajalaxmi Rajagopalan; Romit Roy Choudhury

arXiv:2605.09065·cs.CV·May 12, 2026

Dependency-Aware Discrete Diffusion for Scene Graph Generation

Rajalaxmi Rajagopalan, Romit Roy Choudhury

PDF

TL;DR

This paper introduces a dependency-aware discrete diffusion model for scene graph generation from natural language, improving structural fidelity and downstream image composition.

Contribution

It proposes a hierarchically constrained diffusion approach that decouples structure and semantics, enabling better scene graph generation aligned with text.

Findings

01

Outperforms existing graph generation baselines on standard benchmarks.

02

Enhances compositional alignment in downstream image generation.

03

Captures hierarchical dependencies in scene graphs effectively.

Abstract

Scene graphs (SGs) represent objects and their relationships as structured graphs, enabling applications in image generation, robotics, and 3D understanding. Recent work suggests that conditioning image generation on scene graphs improves compositional fidelity compared to text-only prompting. However, since users typically provide text rather than structured graphs, a key challenge is to generate scene graphs from natural language. Prior work on discrete diffusion has demonstrated success in generating generic graphs such as molecules and circuits, but fails to account for the hierarchical structure and strong dependencies between objects, edges, and relations in scene graphs. We address this limitation by introducing a dependency-aware, hierarchically constrained discrete diffusion model for scene graph generation. Our approach decouples structure and semantics across the forward and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.