Orchid: Image Latent Diffusion for Joint Appearance and Geometry Generation

Akshay Krishnan; Xinchen Yan; Vincent Casser; Abhijit Kundu

arXiv:2501.13087·cs.CV·August 26, 2025

Orchid: Image Latent Diffusion for Joint Appearance and Geometry Generation

Akshay Krishnan, Xinchen Yan, Vincent Casser, Abhijit Kundu

PDF

Open Access

TL;DR

Orchid is a unified latent diffusion model that jointly generates appearance and geometry images such as color, depth, and normals from text, improving efficiency and coherence over separate models.

Contribution

It introduces a novel joint encoding and diffusion approach for appearance and geometry, enabling versatile image generation and inpainting from a single model.

Findings

01

Outperforms state-of-the-art geometry prediction methods in accuracy.

02

Achieves more realistic joint inpainting of color, depth, and normals.

03

Demonstrates versatility in text-to-image and inpainting tasks.

Abstract

We introduce Orchid, a unified latent diffusion model that learns a joint appearance-geometry prior to generate color, depth, and surface normal images in a single diffusion process. This unified approach is more efficient and coherent than current pipelines that use separate models for appearance and geometry. Orchid is versatile - it directly generates color, depth, and normal images from text, supports joint monocular depth and normal estimation with color-conditioned finetuning, and seamlessly inpaints large 3D regions by sampling from the joint distribution. It leverages a novel Variational Autoencoder (VAE) that jointly encodes RGB, relative depth, and surface normals into a shared latent space, combined with a latent diffusion model that denoises these latents. Our extensive experiments demonstrate that Orchid delivers competitive performance against SOTA task-specific methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis

MethodsDiffusion · Latent Diffusion Model