JointNet: Extending Text-to-Image Diffusion for Dense Distribution   Modeling

Jingyang Zhang; Shiwei Li; Yuanxun Lu; Tian Fang; David McKinnon,; Yanghai Tsin; Long Quan; Yao Yao

arXiv:2310.06347·cs.CV·October 11, 2023·2 cites

JointNet: Extending Text-to-Image Diffusion for Dense Distribution Modeling

Jingyang Zhang, Shiwei Li, Yuanxun Lu, Tian Fang, David McKinnon,, Yanghai Tsin, Long Quan, Yao Yao

PDF

Open Access

TL;DR

JointNet is a new neural network architecture that extends pre-trained text-to-image diffusion models to jointly model images and dense modalities like depth maps, enabling diverse applications with efficient training.

Contribution

It introduces a dense modality extension to pre-trained diffusion models by creating a parallel branch that is densely connected to the RGB branch, while keeping the original RGB branch fixed.

Findings

01

Effective joint RGBD generation demonstrated

02

High-quality dense depth prediction achieved

03

Versatile applications including depth-conditioned image generation and 3D panorama synthesis

Abstract

We introduce JointNet, a novel neural network architecture for modeling the joint distribution of images and an additional dense modality (e.g., depth maps). JointNet is extended from a pre-trained text-to-image diffusion model, where a copy of the original network is created for the new dense modality branch and is densely connected with the RGB branch. The RGB branch is locked during network fine-tuning, which enables efficient learning of the new modality distribution while maintaining the strong generalization ability of the large-scale pre-trained diffusion model. We demonstrate the effectiveness of JointNet by using RGBD diffusion as an example and through extensive experiments, showcasing its applicability in a variety of applications, including joint RGBD generation, dense depth prediction, depth-conditioned image generation, and coherent tile-based 3D panorama generation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion