Compositional Scene Modeling with Global Object-Centric Representations

Tonglin Chen; Bin Li; Zhimeng Shen; Xiangyang Xue

arXiv:2211.11500·cs.CV·November 28, 2022

Compositional Scene Modeling with Global Object-Centric Representations

Tonglin Chen, Bin Li, Zhimeng Shen, Xiangyang Xue

PDF

Open Access

TL;DR

This paper introduces an unsupervised compositional scene modeling approach that infers global object representations by separating intrinsic and extrinsic features, enabling better object identification and reconstruction in complex scenes.

Contribution

It proposes a novel method to learn canonical object representations without supervision by combining patch-matching and variational inference techniques.

Findings

01

Outperforms state-of-the-art methods in segmentation and reconstruction

02

Achieves accurate global object identification

03

Demonstrates robustness across multiple benchmarks

Abstract

The appearance of the same object may vary in different scene images due to perspectives and occlusions between objects. Humans can easily identify the same object, even if occlusions exist, by completing the occluded parts based on its canonical image in the memory. Achieving this ability is still a challenge for machine learning, especially under the unsupervised learning setting. Inspired by such an ability of humans, this paper proposes a compositional scene modeling method to infer global representations of canonical images of objects without any supervision. The representation of each object is divided into an intrinsic part, which characterizes globally invariant information (i.e. canonical representation of an object), and an extrinsic part, which characterizes scene-dependent information (e.g., position and size). To infer the intrinsic representation of each object, we employ…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Multimodal Machine Learning Applications

MethodsALIGN