Orchid: Image Latent Diffusion for Joint Appearance and Geometry Generation
Akshay Krishnan, Xinchen Yan, Vincent Casser, Abhijit Kundu

TL;DR
Orchid is a unified latent diffusion model that jointly generates appearance and geometry images such as color, depth, and normals from text, improving efficiency and coherence over separate models.
Contribution
It introduces a novel joint encoding and diffusion approach for appearance and geometry, enabling versatile image generation and inpainting from a single model.
Findings
Outperforms state-of-the-art geometry prediction methods in accuracy.
Achieves more realistic joint inpainting of color, depth, and normals.
Demonstrates versatility in text-to-image and inpainting tasks.
Abstract
We introduce Orchid, a unified latent diffusion model that learns a joint appearance-geometry prior to generate color, depth, and surface normal images in a single diffusion process. This unified approach is more efficient and coherent than current pipelines that use separate models for appearance and geometry. Orchid is versatile - it directly generates color, depth, and normal images from text, supports joint monocular depth and normal estimation with color-conditioned finetuning, and seamlessly inpaints large 3D regions by sampling from the joint distribution. It leverages a novel Variational Autoencoder (VAE) that jointly encodes RGB, relative depth, and surface normals into a shared latent space, combined with a latent diffusion model that denoises these latents. Our extensive experiments demonstrate that Orchid delivers competitive performance against SOTA task-specific methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis
MethodsDiffusion · Latent Diffusion Model
