CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic   Furniture Embedding

Jingyu Liu; Wenhan Xiong; Ian Jones; Yixin Nie; Anchit Gupta; Barlas; O\u{g}uz

arXiv:2303.03565·cs.CV·June 5, 2023·1 cites

CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic Furniture Embedding

Jingyu Liu, Wenhan Xiong, Ian Jones, Yixin Nie, Anchit Gupta, Barlas, O\u{g}uz

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces CLIP-Layout, a novel indoor scene synthesis model that uses CLIP embeddings for instance-level predictions, enabling style-consistent, visually coherent, and zero-shot text-guided scene generation.

Contribution

It presents an auto-regressive model leveraging CLIP embeddings for detailed furniture placement, surpassing previous methods that relied on category labels and ignoring visual attributes.

Findings

01

Achieves state-of-the-art results on 3D-FRONT dataset.

02

Improves auto-completion metrics by over 50%.

03

Enables zero-shot text-guided scene editing.

Abstract

Indoor scene synthesis involves automatically picking and placing furniture appropriately on a floor plan, so that the scene looks realistic and is functionally plausible. Such scenes can serve as homes for immersive 3D experiences, or be used to train embodied agents. Existing methods for this task rely on labeled categories of furniture, e.g. bed, chair or table, to generate contextually relevant combinations of furniture. Whether heuristic or learned, these methods ignore instance-level visual attributes of objects, and as a result may produce visually less coherent scenes. In this paper, we introduce an auto-regressive scene model which can output instance-level predictions, using general purpose image embedding based on CLIP. This allows us to learn visual correspondences such as matching color and style, and produce more functionally plausible and aesthetically pleasing scenes.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chenguolin/InstructScene
pytorch

Models

🤗
dong0625/instruct_scene
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · 3D Surveying and Cultural Heritage · Generative Adversarial Networks and Image Synthesis

MethodsContrastive Language-Image Pre-training