Ontology-Guided Diffusion for Zero-Shot Visual Sim2Real Transfer

Mohamed Youssef; Mayar Elfares; Anna-Maria Meer; Matteo Bortoletto; Andreas Bulling

arXiv:2603.18719·cs.CV·March 26, 2026

Ontology-Guided Diffusion for Zero-Shot Visual Sim2Real Transfer

Mohamed Youssef, Mayar Elfares, Anna-Maria Meer, Matteo Bortoletto, Andreas Bulling

PDF

Open Access

TL;DR

This paper introduces Ontology-Guided Diffusion (OGD), a neuro-symbolic framework that leverages structured knowledge and graph neural networks to improve zero-shot sim2real image translation, outperforming existing diffusion methods.

Contribution

OGD is the first approach to incorporate an ontology of interpretable traits and a knowledge graph into diffusion models for structured, zero-shot sim2real transfer.

Findings

01

OGD achieves better distinction between real and synthetic images than baselines.

02

Outperforms state-of-the-art diffusion methods in sim2real translation benchmarks.

03

Enables interpretable and data-efficient zero-shot transfer.

Abstract

Bridging the simulation-to-reality (sim2real) gap remains challenging as labelled real-world data is scarce. Existing diffusion-based approaches rely on unstructured prompts or statistical alignment, which do not capture the structured factors that make images look real. We introduce Ontology- Guided Diffusion (OGD), a neuro-symbolic zero-shot sim2real image translation framework that represents realism as structured knowledge. OGD decomposes realism into an ontology of interpretable traits -- such as lighting and material properties -- and encodes their relationships in a knowledge graph. From a synthetic image, OGD infers trait activations and uses a graph neural network to produce a global embedding. In parallel, a symbolic planner uses the ontology traits to compute a consistent sequence of visual edits needed to narrow the realism gap. The graph embedding conditions a pretrained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning