Compositional Scene Modeling with Global Object-Centric Representations
Tonglin Chen, Bin Li, Zhimeng Shen, Xiangyang Xue

TL;DR
This paper introduces an unsupervised compositional scene modeling approach that infers global object representations by separating intrinsic and extrinsic features, enabling better object identification and reconstruction in complex scenes.
Contribution
It proposes a novel method to learn canonical object representations without supervision by combining patch-matching and variational inference techniques.
Findings
Outperforms state-of-the-art methods in segmentation and reconstruction
Achieves accurate global object identification
Demonstrates robustness across multiple benchmarks
Abstract
The appearance of the same object may vary in different scene images due to perspectives and occlusions between objects. Humans can easily identify the same object, even if occlusions exist, by completing the occluded parts based on its canonical image in the memory. Achieving this ability is still a challenge for machine learning, especially under the unsupervised learning setting. Inspired by such an ability of humans, this paper proposes a compositional scene modeling method to infer global representations of canonical images of objects without any supervision. The representation of each object is divided into an intrinsic part, which characterizes globally invariant information (i.e. canonical representation of an object), and an extrinsic part, which characterizes scene-dependent information (e.g., position and size). To infer the intrinsic representation of each object, we employ…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Multimodal Machine Learning Applications
MethodsALIGN
