BIFR\"OST: 3D-Aware Image compositing with Language Instructions
Lingxiao Li, Kaixiong Gong, Weihong Li, Xili Dai, Tao Chen, Xiaojun, Yuan, Xiangyu Yue

TL;DR
BIFRÖST is a 3D-aware image compositing framework using diffusion models that incorporates depth and spatial reasoning from language instructions, enabling realistic and complex image compositions.
Contribution
It introduces a novel 3D-aware compositing method that integrates depth maps and language understanding, improving spatial handling over previous 2D-only approaches.
Findings
Outperforms existing methods in qualitative and quantitative evaluations.
Effectively models complex spatial relationships including occlusion.
Reduces need for extensive annotated datasets.
Abstract
This paper introduces Bifr\"ost, a novel 3D-aware framework that is built upon diffusion models to perform instruction-based image composition. Previous methods concentrate on image compositing at the 2D level, which fall short in handling complex spatial relationships (, occlusion). Bifr\"ost addresses these issues by training MLLM as a 2.5D location predictor and integrating depth maps as an extra condition during the generation process to bridge the gap between 2D and 3D, which enhances spatial comprehension and supports sophisticated spatial interactions. Our method begins by fine-tuning MLLM with a custom counterfactual dataset to predict 2.5D object locations in complex backgrounds from language instructions. Then, the image-compositing model is uniquely designed to process multiple types of input features, enabling it to perform high-fidelity image compositions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
MethodsDiffusion
