LDM3D-VR: Latent Diffusion Model for 3D VR
Gabriela Ben Melech Stan, Diana Wofk, Estelle Aflalo, Shao-Yen Tseng,, Zhipeng Cai, Michael Paulitsch, Vasudev Lal

TL;DR
LDM3D-VR introduces diffusion models for generating panoramic RGBD images and upscaling low-resolution inputs, advancing VR content creation with text-guided depth and resolution enhancement.
Contribution
It presents novel diffusion models specifically designed for joint RGB and depth map generation and upscaling in virtual reality applications.
Findings
Successful generation of panoramic RGBD from text prompts
Effective upscaling of low-resolution RGBD images
Models outperform existing related methods
Abstract
Latent diffusion models have proven to be state-of-the-art in the creation and manipulation of visual outputs. However, as far as we know, the generation of depth maps jointly with RGB is still limited. We introduce LDM3D-VR, a suite of diffusion models targeting virtual reality development that includes LDM3D-pano and LDM3D-SR. These models enable the generation of panoramic RGBD based on textual prompts and the upscaling of low-resolution inputs to high-resolution RGBD, respectively. Our models are fine-tuned from existing pretrained models on datasets containing panoramic/high-resolution RGB images, depth maps and captions. Both models are evaluated in comparison to existing related methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques
MethodsDiffusion
